Commit Graph

149978 Commits

Author SHA1 Message Date
Usman Nadeem 9396c3ec7b [AArch64][SVE] Remove assertion/range check for i16 values during immediate selection
The assertion can fail in some cases when an i16 constant is promoted
to i32.

e.g. in the added test case the value `i16 -32768` is within the range
of i16 but the assert fails when the constant is promoted to positive
`i32 32768` by an earlier call to DAG.getConstant().

Differential Revision: https://reviews.llvm.org/D107880

Change-Id: I2f6179783cbc9630e6acab149a762b43c65664de
2021-08-11 14:50:20 -07:00
Amara Emerson 2c1789bc8c [AArch64][GlobalISel] Add ptradd_immed_chain combine to post-legalizer combiner. 2021-08-11 13:59:23 -07:00
Akira Hatanaka 643ce61fb3 [ObjC][ARC] Don't form a StoreStrong call if it is unsafe to move the
release call

findSafeStoreForStoreStrongContraction checks whether it's safe to move
the release call to the store by inspecting all instructions between the
two, but was ignoring retain instructions. This was causing objects to
be released and deallocated before they were retained.

rdar://81668577
2021-08-11 13:50:19 -07:00
Usman Nadeem a7c4e9b1f7 [InstSimplify] Eliminate vector reverse of a splat vector
experimental.vector.reverse(splat(X)) -> splat(X)

Differential Revision: https://reviews.llvm.org/D107793

Change-Id: Id29ba88fd669ff8686712e96b1bdc46dda5b853c
2021-08-11 11:27:58 -07:00
Rong Xu 4c5909ba83 [SampleFDO] Add two passes of MIRAddFSDiscriminatorsPass
This patch adds Pass1 of MIRADDFSDiscriminatorsPass before register
allocation, and Pass2 of MIRAddFSDiscriminatorsPass before
Block-Placement. This is still under --enable-fs-discrmininator
option (default false).

This would reduce the turn-around time for FSAFDO transition.

Differential Revision: https://reviews.llvm.org/D104579
2021-08-11 11:11:04 -07:00
Kazu Hirata 63c566b1fd [DWARF] Remove extractFast (NFC)
The last use was removed on Dec 13, 2016 in commit
c8c1032c0c.  This patch repurposes the
function comment for the other variant of extractFast.
2021-08-11 09:55:00 -07:00
Sanjay Patel a0a9c9e188 [InstCombine] avoid breaking up min/max (cmp+sel) idioms
This is a quick fix for a motivating case that looks like this:
https://godbolt.org/z/GeMqzMc38

As noted, we might be able to restore the min/max patterns
with select folds, or we just wait for this to become easier
with canonicalization to min/max intrinsics.
2021-08-11 12:48:11 -04:00
Yolanda Chen 8fa16cc628 [LTO][lld] Add lto-pgo-warn-mismatch option
When enable CSPGO for ThinLTO, there are profile cfg mismatch warnings that will cause lld-link errors (with /WX)
due to source changes (e.g. `#if` code runs for profile generation but not for profile use)
To disable it we have to use an internal "/mllvm:-no-pgo-warn-mismatch" option.
In contrast clang uses option ”-Wno-backend-plugin“ to avoid such warnings and gcc has an explicit "-Wno-coverage-mismatch" option.

Add "lto-pgo-warn-mismatch" option to lld COFF/ELF to help turn on/off the profile mismatch warnings explicitly when build with ThinLTO and CSPGO.

Differential Revision: https://reviews.llvm.org/D104431
2021-08-11 09:45:55 -07:00
Fraser Cormack 885be620f9 [LegalizeTypes][NFC] Remove else-after-return
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D107890
2021-08-11 16:48:28 +01:00
David Green 8c50b5fbfe [ARM] Add extra debug messages for validating live outs. NFC
We are running into more and more cases where the liveouts of low
overhead loops do not validate. Add some extra debug messages to make it
clearer why.
2021-08-11 10:35:53 +01:00
Wang, Pengfei 6c4809825d Revert "[lld] Add lto-pgo-warn-mismatch option"
This reverts commit 0cfb00a1c9.
2021-08-11 16:25:42 +08:00
Cullen Rhodes 1fe0e6a380 [AArch64][SME] Support ptrue(s) in streaming mode
The ptrue and ptrues instructions are legal in streaming mode, missed in
D106272.

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06/SVE-Instructions

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D107807
2021-08-11 07:49:36 +00:00
Rainer Orth 7bbbf29561 [ELF] Don't emit SHF_GNU_RETAIN on Solaris
The introduction of `SHF_GNU_RETAIN` has caused massive problems on Solaris.

Initially, as reported in Bug 49437, it caused dozens of testsuite failures
on both sparc and x86.  The objects were marked as `ELFOSABI_NONE`, but
`SHF_GNU_RETAIN` is a GNU extension. In the native Solaris ABI, that flag
(in the range for OS-specific values) is `SHF_SUNW_ABSENT` with a
completely different semantics, which confuses Solaris `ld` very much.

Later, the objects became (correctly) marked `ELFOSABI_GNU`, which Solaris
`ld` doesn't support, causing it to SEGV and break the build.  The linker
is currently being hardened to not accept non-native OS ABIs to avoid this.

The need for linker support is already documented in
`clang/include/clang/Basic/AttrDocs.td`, but not currently checked.

This patch avoids all this by not emitting `SHF_GNU_RETAIN` on Solaris at all.

Tested on `amd64-pc-solaris2.11`, `sparcv9-sun-solaris2.11`, and
`x86_64-pc-linux-gnu`.

Differential Revision: https://reviews.llvm.org/D107747
2021-08-11 09:27:51 +02:00
madhur13490 61526b1262 [DAG] Reword comment for EnforceNodeIdInvariant and InvalidateNodeId. NFC.
Reviewed By: niravd

Differential Revision: https://reviews.llvm.org/D107845
2021-08-11 12:14:28 +05:30
Yolanda Chen 0cfb00a1c9 [lld] Add lto-pgo-warn-mismatch option
When enable CSPGO for ThinLTO, there are profile cfg mismatch warnings that will cause lld-link errors (with /WX).
To disable it we have to use an internal "/mllvm:-no-pgo-warn-mismatch" option.
In contrast clang uses option ”-Wno-backend-plugin“ to avoid such warnings and gcc has an explicit "-Wno-coverage-mismatch" option.

Add this "lto-pgo-warn-mismatch" option to lld to help turn on/off the profile mismatch warnings explicitly when build with ThinLTO and CSPGO.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D104431
2021-08-11 14:43:26 +08:00
Petr Hosek 389dc94d4b [InstrProfiling] Generate runtime hook for Fuchsia
When none of the translation units in the binary have been instrumented
we shouldn't need to link the profile runtime. However, because we pass
-u__llvm_profile_runtime on Linux and Fuchsia, the runtime would still
be pulled in and incur some overhead. On Fuchsia which uses runtime
counter relocation, it also means that we cannot reference the bias
variable unconditionally.

This change modifies the InstrProfiling pass to pull in the profile
runtime only when needed by declaring the __llvm_profile_runtime symbol
in the translation unit only when needed. For now we restrict this only
for Fuchsia, but this can be later expanded to other platforms. This
approach was already used prior to 9a041a7522, but we changed it
to always generate the __llvm_profile_runtime due to a TAPI limitation,
but that limitation may no longer apply, and it certainly doesn't apply
on platforms like Fuchsia.

Differential Revision: https://reviews.llvm.org/D98061
2021-08-10 23:21:15 -07:00
Petr Hosek c0c1c3cf93 Revert "[InstrProfiling] Emit bias variable eagerly"
This reverts commit 6660cec568 since
it was superseded by https://reviews.llvm.org/D98061.
2021-08-10 23:21:15 -07:00
Johannes Doerfert fc32a5c87d [Attributor][NFC] Try to make the windows build bots happy
Failed for some reason, potentially because of the inner type
declaration in combination with the `using`. This might help.

Failure:
https://lab.llvm.org/buildbot/#/builders/127/builds/15432
2021-08-11 01:11:37 -05:00
Johannes Doerfert e7e3585cde [Attributor][FIX] Handle recurrences (PHIs) in AAPointerInfo explicitly
PHI nodes are not pass through but change their value, we have to
account for that to avoid missing stores.

Follow up for D107798 to fix PR51249 for good.

Differential Revision: https://reviews.llvm.org/D107808
2021-08-11 00:49:54 -05:00
Johannes Doerfert 96da6dd6ba [Attributor][FIX] Only avoid visiting PHI uses multiple times (PR51249)
AAPointerInfoFloating needs to visit all uses and some multiple times if
we go through PHI nodes. Attributor::checkForAllUses keeps a visited set
so we don't recurs endlessly. We now allow recursion for non-phi uses so
we track all pointer offsets via PHI nodes properly without endless
recursion.

This replaces the first attempt D107579.

Differential Revision: https://reviews.llvm.org/D107798
2021-08-11 00:49:54 -05:00
Johannes Doerfert e0c5d83a92 [OpenMP][FIX] Disabled optimizations have to be made known
To avoid simplification with wrong constants we need to make sure we
know that we won't perform specific optimizations based on the users
request. The non-SPMDzation and non-CustomStateMachine flags did only
prevent the final transformation but allowed to value simplification
to go ahead.

Differential Revision: https://reviews.llvm.org/D107862
2021-08-11 00:49:53 -05:00
Craig Topper a8ae41fb51 [SelectionDAGBuilder] Save iterator to avoid second DenseMap lookup. NFC
We were calling find and then using operator[]. Instead keep the
iterator from find and use it to get the value.

Just happened to notice while investigating how we decide what extends
to use between basic blocks.
2021-08-10 22:37:48 -07:00
Mircea Trofin 510402c2c8 [NFC][MLGO] 'Use' variable used for asserts 2021-08-10 19:55:17 -07:00
Christopher Di Bella c874dd5362 [llvm][clang][NFC] updates inline licence info
Some files still contained the old University of Illinois Open Source
Licence header. This patch replaces that with the Apache 2 with LLVM
Exception licence.

Differential Revision: https://reviews.llvm.org/D107528
2021-08-11 02:48:53 +00:00
Hongtao Yu 68d6c3e448 [CSSPGO] Additional cleanup as a follow-up to D107838 2021-08-10 19:04:20 -07:00
Hongtao Yu 78523516bc [CSSPGO] Do not use getCanonicalFnName in pseudo probe descriptor decoding
Pseudo probe descriptors are created very early in the pipeline where function names just come from the front end and are not yet decorated. So calling getCanonicalFnName on the function names in probe desc is basically a no-op, which also addes a depenency from MC to ProfileData unnessesarily.

Reviewed By: wenlei, wlei

Differential Revision: https://reviews.llvm.org/D107838
2021-08-10 18:24:39 -07:00
Amara Emerson 7ec4ce157b [AArch64][GlobalISel] Relax oneuse restriction for PTR_ADD chain combining to check addressing legality.
With contributions by Sebastian Neubauer

Differential Revision: https://reviews.llvm.org/D105676
2021-08-10 16:41:18 -07:00
Sushma Unnibhavi 7bdce6bcbd [M68k][GloballSel] RegBankSelect implementation
Implementation of RegBankSelect for the M68k backend.

Differential Revision: https://reviews.llvm.org/D107542
2021-08-10 15:24:43 -07:00
Adrian Prantl a353edb8d6 Simplify coro::salvageDebugInfo() (NFC-ish)
This patch removes the hand-rolled implementation of salvageDebugInfo
for cast and GEPs and replaces it with a call into
llvm::salvageDebugInfoImpl().

A side-effect of this is that additional redundant convert operations
are introduced, but those don't have any negative effect on the
resulting DWARF expression.

rdar://80227769

Differential Revision: https://reviews.llvm.org/D107384
2021-08-10 15:21:18 -07:00
Adrian Prantl d6b6880172 Streamline the API of salvageDebugInfoImpl (NFC)
This patch refactors / simplifies salvageDebugInfoImpl(). The goal
here is to simplify the implementation of coro::salvageDebugInfo() in
a followup patch.

  1. Change the return value to I.getOperand(0). Currently users of
     salvageDebugInfoImpl() assume that the first operand is
     I.getOperand(0). This patch makes this information explicit. A
     nice side-effect of this change is that it allows us to salvage
     expressions such as add i8 1, %a in the future.

  2. Factor out the creation of a DIExpression and return an array of
     DIExpression operations instead. This change allows users that
     call salvageDebugInfoImpl() in a loop to avoid the costly
     creation of temporary DIExpressions and to defer the creation of
     a DIExpression until the end.

This patch does not change any functionality.

rdar://80227769

Differential Revision: https://reviews.llvm.org/D107383
2021-08-10 15:21:18 -07:00
Nikita Popov 17db125b48 [MemCpyOpt] Optimize MemoryDef insertion
When converting a store into a memset, we currently insert the new
MemoryDef after the store MemoryDef, which requires all uses to be
renamed to the new def using a whole block scan. Instead, we can
insert the new MemoryDef before the store and not rename uses,
because we know that the location is immediately overwritten, so
all uses should still refer to the old MemoryDef. Those uses will
get renamed when the old MemoryDef is actually dropped, which is
efficient.

I expect something similar can be done for some of the other MSSA
updates in MemCpyOpt. This is an alternative to D107513, at least
for this particular case.

Differential Revision: https://reviews.llvm.org/D107702
2021-08-10 21:28:29 +02:00
Thomas Johnson b821086876 [ARC] Add codegen for count trailing zeros intrinsic for the ARC backend
Differential Revision: https://reviews.llvm.org/D107828
2021-08-10 12:07:35 -07:00
Fangrui Song 76093b1739 [InlineAdvisor] Add single quotes around caller/callee names
Clang diagnostics refer to identifier names in quotes.
This patch makes inline remarks conform to the convention.
New behavior:

```
% clang -O2 -Rpass=inline -Rpass-missed=inline -S a.c
a.c:4:25: remark: 'foo' inlined into 'bar' with (cost=-30, threshold=337) at callsite bar:0:25; [-Rpass=inline]
int bar(int a) { return foo(a); }
                        ^
```

Reviewed By: hoy

Differential Revision: https://reviews.llvm.org/D107791
2021-08-10 11:51:31 -07:00
Sanjay Patel b267d3ce8d [InstCombine] avoid infinite loops from min/max canonicalization
The intrinsics have an extra chunk of known bits logic
compared to the normal cmp+select idiom. That allows
folding the icmp in each case to something better, but
that then opposes the canonical form of min/max that
we try to form for a select.

I'm carving out a narrow exception to preserve all
existing regression tests while avoiding the inf-loop.
It seems unlikely that this is the only bug like this
left, but this should fix:
https://llvm.org/PR51419
2021-08-10 14:42:37 -04:00
Jinsong Ji 2cfd427626 [AIX] Don't crash on unimplemented lowerRelativeReference
We may call lowerRelativeReference in MC to determine whether target
supports this lowering. We should return nullptr instead of crashing
when we haven't implemented the real lowering.

Reviewed By: hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D107830
2021-08-10 17:43:06 +00:00
Matt Arsenault d719f1c3cc AMDGPU: Add alloc priority to global ranges
The requested register class priorities weren't respected
globally. Not sure why this is a target option, and not just the
expected behavior (recently added in
1a6dc92be7). This avoids an allocation
failure when many wide tuple spills are introduced. I think this is a
workaround since I would not expect the allocation priority to be
required, and only a performance hint. The allocator should be smarter
about when only a subregister needs to be spilled and restored.

This does regress a couple of degenerate store stress lit tests which
shouldn't be too important.
2021-08-10 13:12:34 -04:00
Matt Arsenault 1b41945da0 RegAllocGreedy: Add spaces between registers in debug message 2021-08-10 13:12:34 -04:00
Craig Topper 6f5edc3487 [RISCV] Fold (add (select lhs, rhs, cc, 0, y), x) -> (select lhs, rhs, cc, x, (add x, y))
Similar for sub except sub isn't commutative.

Modify the existing and/or/xor folds to also work on ISD::SELECT
and not just RISCVISD::SELECT_CC. This is needed to make sure
we do this transform before type legalization turns i32 add/sub
into add/sub+sign_extend_inreg on RV64. If we don't do this before
that, the sign_extend_inreg will still be after the select.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D107603
2021-08-10 09:02:56 -07:00
Sanjay Patel e260e10c4a [InstSimplify] fold min/max with limit constant
This is already done within InstCombine:
https://alive2.llvm.org/ce/z/MiGE22

...but leaving it out of analysis makes it
harder to avoid infinite loops there.
2021-08-10 10:57:25 -04:00
Sanjay Patel 188832f419 Revert "[InstSimplify] fold min/max with limit constant; NFC"
This reverts commit f43859b437.
This is not NFC, so I'll try again without that mistake in the commit message.
2021-08-10 10:50:09 -04:00
Sanjay Patel f43859b437 [InstSimplify] fold min/max with limit constant; NFC
This is already done within InstCombine:
https://alive2.llvm.org/ce/z/MiGE22

...but leaving it out of analysis makes it
harder to avoid infinite loops there.
2021-08-10 10:43:07 -04:00
David Green 013030a0b2 [AArch64] Correct sinking of shuffles to adds/subs
This was checking extends as shuffles, where as we should be checking
the operands. This helps sink the shuffles, creating more addl/subl
instructions.

Differential Revision: https://reviews.llvm.org/D107623
2021-08-10 13:25:42 +01:00
Tim Northover 5ad0860899 AArch64: support @llvm.va_copy in GISel 2021-08-10 13:11:03 +01:00
Konstantin Schwarz 64bef13f08 [GlobalISel] Look through truncs and extends in narrowScalarShift
If a G_SHL is fed by a G_CONSTANT, the lower and upper bits of the source can be
shifted individually by the constant shift amount.

However in case the shift amount came from a G_TRUNC(G_CONSTANT), the generic shift legalization
code was used, producing intermediate shifts that are potentially illegal on some targets.

This change teaches narrowScalarShift to look through G_TRUNCs and G_*EXTs.

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D89100
2021-08-10 13:49:22 +02:00
Carl Ritson a1783b54e8 [SimpifyCFG] Remove recursion from FoldCondBranchOnPHI. NFCI.
Avoid stack overflow errors on systems with small stack sizes
by removing recursion in FoldCondBranchOnPHI.

This is a simple change as the recursion was only iteratively
calling the function again on the same arguments.
Ideally this would be compiled to a tail call, but there is
no guarantee.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D107803
2021-08-10 19:14:31 +09:00
David Green c140ff493e [ARM] Change a couple of instances of LiveRegs.contains to !LiveRegs.available
This changes a couple of calls to LiveRegs.contains to
!LiveRegs.available, one in Thumb1FrameLoweringInfo (which modifies a
test to look more correct to me, given r7 should be the frame pointer so
is not available), and another in the ARMLoadStoreOptimizer, that I
don't have a test for, it was just found by inspection.

Differential Revision: https://reviews.llvm.org/D107454
2021-08-10 09:53:26 +01:00
Tony Tye 53eb469195 [AMDGPU] Support non-strictly stronger memory orderings in SIMemoryLegalizer
C++20 no longer requires the failure memory ordering to be no stronger than the
success memory ordering. Adjust assert in AMD GPU SIMemoryLegalizer, and merge
instruction memory orderings

Add common operation to merge memory orders that allows non strict memory
orderings to be combined. Use it in SIMemoryLegalizer and
MachineMemOperand::getMergedOrdering.

Reviewed By: efriedma, rampitec

Differential Revision: https://reviews.llvm.org/D106729
2021-08-10 08:43:03 +00:00
David Sherwood ce394161cb [InstCombine] Add more complex folds for extractelement + stepvector
I have updated cheapToScalarize to also consider the case when
extracting lanes from a stepvector intrinsic. This required removing
the existing 'bool IsConstantExtractIndex' and passing in the actual
index as a Value instead. We do this because we need to know if the
index is <= known minimum number of elements returned by the stepvector
intrinsic. Effectively, when extracting lane X from a stepvector we
know the value returned is also X.

New tests added here:

  Transforms/InstCombine/vscale_extractelement.ll

Differential Revision: https://reviews.llvm.org/D106358
2021-08-10 09:17:21 +01:00
Cullen Rhodes 81f057c253 [AArch64][SVE] NFC: Remove unused p0-p7 with element size predicates
Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D107752
2021-08-10 07:56:22 +00:00
David Sherwood 8439415333 [IR] Let ConstantVector::getSplat use poison instead of undef
This patch updates ConstantVector::getSplat to use poison instead
of undef when using insertelement/shufflevector to splat.

This follows on from D93793.

Differential Revision: https://reviews.llvm.org/D107751
2021-08-10 08:27:43 +01:00
Wang, Pengfei 6f7f5b54c8 [X86] AVX512FP16 instructions enabling 1/6
1. Enable FP16 type support and basic declarations used by following patches.
2. Enable new instructions VMOVW and VMOVSH.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105263
2021-08-10 12:46:01 +08:00
Usman Nadeem 5420fc4a27 [AArch64][SVE][InstCombine] Unpack of a splat vector -> Scalar extend
Replace vector unpack operation with a scalar extend operation.
  unpack(splat(X)) --> splat(extend(X))

If we have both, unpkhi and unpklo, for the same vector then we may
save a register in some cases, e.g:
  Hi = unpkhi (splat(X))
  Lo = unpklo(splat(X))
   --> Hi = Lo = splat(extend(X))

Differential Revision: https://reviews.llvm.org/D106929

Change-Id: I77c5c201131e3a50de1cdccbdcf84420f5b2244b
2021-08-09 14:58:54 -07:00
Usman Nadeem 85bbc05154 [AArch64][SVE][InstCombine] Move last{a,b} before binop if one operand is a splat value
Move the last{a,b} operation to the vector operand of the binary instruction if
the binop's operand is a splat value. This essentially converts the binop
to a scalar operation.

Example:
    // If x and/or y is a splat value:
    lastX (binop (x, y)) --> binop(lastX(x), lastX(y))

Differential Revision: https://reviews.llvm.org/D106932

Change-Id: I93ff5302f9a7972405ee0d3854cf115f072e99c0
2021-08-09 14:48:41 -07:00
Eli Friedman ac20e56911 [AArch64] Implement FCOPYSIGN for SVE.
I was originally going to try to implement this in target-independent
code, but it's actually sort of tricky to generate the correct sequence
for vectors like nxv2f32.  So just stick this in target-specific code,
at least for now.

Differential Revision: https://reviews.llvm.org/D107608
2021-08-09 12:06:48 -07:00
Arnold Schwaighofer b987c283ae [coro] Correct CurrentBlock tracking bug recently introduced
We use the CurrentBlock to determine whether we have already processed a
block. Don't reuse this variable for setting where we should insert the
rematerialization. The rematerialization block is different to the
current block when we rematerialize for coro suspend block users.

Differential Revision: https://reviews.llvm.org/D107573
2021-08-09 10:41:41 -07:00
Bradley Smith 73ecb9987b [AArch64][SVE] Fix assertion failure when lowering fixed length gather/scatter
The patterns for fixed length gather/scatter with 32-bit offsets and
64-bit memory type are slightly different that the rest of the patterns,
as such the lowering needs to be slightly different to ensure the
correct types are used.

Differential Revision: https://reviews.llvm.org/D107576
2021-08-09 14:05:22 +00:00
Jeremy Morse d4ce9e463d [DWARF] Revert sharing subprograms across CUs
This patch is a revert of e08f205f5c. In that patch, DW_TAG_subprograms
were permitted to be referenced across CU boundaries, to improve stack
trace construction using call site information. Unfortunately, as
documented in PR48790, the way that subprograms are "owned" by dwarf units
is sufficiently complicated that subprograms end up in unexpected units,
invalidating cross-unit references.

There's no obvious way to easily fix this, and several attempts have
failed. Revert this to ensure correct DWARF is always emitted.

Three tests change in addition to the reversion, but they're all very
light alterations.

Differential Revision: https://reviews.llvm.org/D107076
2021-08-09 12:43:43 +01:00
Fraser Cormack 2b4a1d4b86 [RISCV] Improve codegen for shuffles with LHS/RHS splats
Shuffles which are broken into separate halves reveal splats in which
a half is accessed via one index; such operations can be optimized to
use "vrgather.vi".

This optimization could be achieved by adding extra patterns to match
`vrgather_vv_vl` which uses a splat as an index operand, but this patch
instead identifies splat earlier. This way, future optimizations can
build on top of the data gathered here, e.g., to splat-gather dominant
indices and insert any leftovers.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D107449
2021-08-09 10:31:40 +01:00
Luo, Yuanke 53642d5b80 [NFC] Fix the formula for reciprocal calculation.
Differential Revision: https://reviews.llvm.org/D107713
2021-08-09 16:03:56 +08:00
Min-Yih Hsu 7cbcde4aa3 [M68k] Use separate asm operand class for different widths of address
This could help asm parser to pick the correct variant of instruction.
This patch also migrated all the control instructions MC tests.
2021-08-09 00:07:19 -07:00
Min-Yih Hsu cf277f0b31 [M68k][NFC] Coalesce render methods in different asm register op class
And assign RegClass (i.e. operand class for all GPR) as the super class
of ARegClass and DRegClass. Note that this is a NFC change because
actually we already had XRDReg to model either address or data register
operands (as well as test coverage for it). The new super class syntax
added here is just making the relations between three RegClass-es more
explicit.
2021-08-09 00:07:19 -07:00
Cullen Rhodes 1a18bb9270 [AArch64] NFC: Remove DecodeVectorRegisterClass from disassembler
The decoder function and table are the same as FPR128, use that instead.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D107644
2021-08-09 06:52:47 +00:00
Simon Atanasyan 990e8025b5 [MC][ELF] Do not error on parsing .debug_* section directive for MIPS
MIPS .debug_* sections should have SHT_MIPS_DWARF section type to
distinguish among sections contain DWARF and ECOFF debug formats, but in
assembly files these sections have SHT_PROGBITS (@progbits) type. Now
assembler shows 'changed section type for ...' error when parsing
`.section .debug_*,"",@progbits` directive for MIPS targets.

The same problem exists for x86-64 target and this patch extends
workaround implemented in D76151. The patch adds one more case
when assembler ignores section types mismatch after `SwitchSection()`
call.

Differential Revision: https://reviews.llvm.org/D107707
2021-08-09 08:54:56 +03:00
Christudasan Devadasan fcf2d5f402 Revert "SROA: Enhance speculateSelectInstLoads"
This reverts commit ffc3fb665d.
2021-08-09 01:13:39 -04:00
Michael Liao b5e470aa2e [LowerMemIntrinsics] Typo fix. 2021-08-08 22:38:58 -04:00
Craig Topper 2f3b738960 [RISCV] Add optimizations for FMV_X_ANYEXTH similar to FMV_X_ANYEXTW_RV64.
This enables the fneg and fabs combines we have for FMV_X_ANYEXTW_RV64.
2021-08-08 18:30:48 -07:00
Craig Topper 88bc29f5f2 [RISCV] Introduce a RISCV CondCode enum instead of using ISD:SET* in MIR. NFC
Previously we converted ISD condition codes to integers and stored
them directly in our MIR instructions. The ISD enum kind of belongs
to SelectionDAG so that seems like incorrect layering.

This patch instead uses a CondCode node on RISCV::SELECT_CC until
isel and then converts it from ISD encoding to a RISCV specific value.
This value can be converted to/from the RISCV branch opcodes in the
RISCV namespace.

My larger motivation is to possibly support a microarchitectural
feature of some CPUs where a short forward branch over a single
instruction can be predicated internally. This will require a new
pseudo instruction for select that needs to carry a branch condition
and live probably until RISCVExpandPseudos. At that point it can be
expanded to control flow without other instructions ending up in the
predicated basic block. Using an ISD encoding in RISCVExpandPseudos
doesn't seem like correct layering.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D107400
2021-08-08 17:25:37 -07:00
Craig Topper 20dfe051ab [RISCV] Move the $rs operand of PseudoStore from outs to ins. NFC
This is the data to be stored so it should be an input.

To keep operand order similar between loads and stores, move the temp
register to the first dest operand of floating point loads. Rework
the assembler code accordingly.

This doesn't have any functional effect because this Pseudo is only
used by the assembler which doesn't use ins/outs.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D107309
2021-08-08 15:58:24 -07:00
Kazu Hirata d9c9d13365 [DWARF] Remove collectChildrenAddressRanges (NFC)
The last use was removed on Dec 21, 2018 in commit
c3f30a7fc6.
2021-08-08 08:57:32 -07:00
Dorit Nuzman 67278b8a90 [LV] Support Interleaved Store Group With Gaps
Teach LV to use masked-store to support interleave-store-group with
gaps (instead of scatters/scalarization).

The symmetric case of using masked-load to support
interleaved-load-group with gaps was introduced a while ago, by
https://reviews.llvm.org/D53668; This patch completes the store-scenario
leftover from D53668, and solves PR50566.

Reviewed by: Ayal Zaks

Differential Revision: https://reviews.llvm.org/D104750
2021-08-08 10:32:02 +03:00
Min-Yih Hsu 657bb7262d [M68k] Separate ADDA from ADD and migrate rest of the arithmetic MC tests
Previously ADD & ADDA (as well as SUB & SUBA) instructions are mixed
together, which not only violated Motorola assembly's syntax but also
made asm parsing more difficult. This patch separates these two kinds of
instructions migrate rest of the tests from
test/CodeGen/M68k/Encoding/Arithmetic to test/MC/M68k/Arithmetic.

Note that we observed minor regressions on codegen quality: Sometimes
isel uses ADD instead of ADDA even the latter can lead to shorter
sequence of code. This issue implies that some isel patterns might need
to be updated.
2021-08-07 17:19:12 -07:00
Craig Topper d4ee84ceee [RISCV] Support FP_TO_S/UINT_SAT for i32 and i64.
The fcvt fp to integer instructions saturate if their input is
infinity or out of range, but the instructions produce a maximum
integer for nan instead of 0 required for the ISD opcodes.

This means we can use the instructions to do the saturating
conversion, but we'll need to fix up the nan case at the end.

We can probably improve the i8 and i16 default codegen as well,
but I'll leave that for a follow up.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D107230
2021-08-07 16:06:00 -07:00
Nikita Popov 88003cea1c [MemCpyOpt] Remove MemDepAnalysis-based implementation
The MemorySSA-based implementation has been enabled for a few months
(since D94376). This patch drops the old MDA-based implementation
entirely.

I've kept this to only the basic cleanup of dropping various
conditions -- the code could be further cleaned up now that there
is only one implementation.

Differential Revision: https://reviews.llvm.org/D102113
2021-08-07 22:35:44 +02:00
Krishna a9a176ca3b [InstCombine] Remove nnan requirement for transformation to fabs from select
In this patch, the "nnan" requirement is removed for the canonicalization of select with fcmp to fabs.
(i) FSub logic: Remove check for nnan flag presence in fsub. Example: https://alive2.llvm.org/ce/z/751svg (fsub).
(ii) FNeg logic: Remove check for the presence of nnan and nsz flag in fneg. Example: https://alive2.llvm.org/ce/z/a_fsdp (fneg).

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D106872
2021-08-07 22:38:45 +05:30
Craig Topper 24dfba8d50 [X86] Teach shouldSinkOperands to recognize pmuldq/pmuludq patterns.
The IR for pmuldq/pmuludq intrinsics uses a sext_inreg/zext_inreg
pattern on the inputs. Ideally we pattern match these away during
isel. It is possible for LICM or other middle end optimizations
to separate the extend from the mul. This prevents SelectionDAG
from removing it or depending on how the extend is lowered, we
may not be able to generate an AssertSExt/AssertZExt in the
mul basic block. This will prevent pmuldq/pmuludq from being
formed at all.

This patch teaches shouldSinkOperands to recognize this so
that CodeGenPrepare will clone the extend into the same basic
block as the mul.

Fixes PR51371.

Differential Revision: https://reviews.llvm.org/D107689
2021-08-07 08:45:56 -07:00
Roman Lebedev 0a241e90d4
[NFC][InstCombine] `vector_reduce_xor(?ext(<n x i1>))` --> `?ext(vector_reduce_add(<n x i1>))`
Instead of expanding it ourselves,
we can just forward to `?ext(vector_reduce_add(<n x i1>))`, as per alive2:
https://alive2.llvm.org/ce/z/ymz7zE (self)
https://alive2.llvm.org/ce/z/eKu2v2 (skipped zext)
https://alive2.llvm.org/ce/z/c3BXgc (skipped sext)
2021-08-07 17:31:33 +03:00
Roman Lebedev c6ff867f92
[NFC][InstCombine] Simplify emitted IR for `vector_reduce_xor(?ext(<n x i1>))`
Now that we canonicalize low bit splatting to the form we were emitting
here ourselves, emit simpler IR that will be canonicalized later.

See 1e801439be for proofs:
https://alive2.llvm.org/ce/z/MjCm5W (self)
https://alive2.llvm.org/ce/z/kgqF4M (skipped zext)
https://alive2.llvm.org/ce/z/pgy3HP (skipped sext)
2021-08-07 17:31:24 +03:00
Roman Lebedev e71870512f
[InstCombine] Prefer `-(x & 1)` as the low bit splatting pattern (PR51305)
Both patterns are equivalent (https://alive2.llvm.org/ce/z/jfCViF),
so we should have a preference. It seems like mask+negation is better
than two shifts.
2021-08-07 17:25:28 +03:00
Christudasan Devadasan ffc3fb665d SROA: Enhance speculateSelectInstLoads
Allow the folding even if there is an
intervening bitcast.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D106667
2021-08-07 09:09:14 -04:00
Florian Hahn a00aafc30d
[VPlan] Iterate over phi recipes to detect reductions to fix.
After refactoring the phi recipes, we can now iterate over all header
phis in a VPlan to detect reductions when it comes to fixing them up
when tail folding.

This reduces the coupling with the cost model & legal by using the
information directly available in VPlan. It also removes a call to
getOrAddVPValue, which references the original IR value which may
become outdated after VPlan transformations.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D100102
2021-08-07 14:06:50 +01:00
Amara Emerson 4c2e01232c [GlobalISel] Fix a combine causing DBG_VALUE with dangling vregs.
We should use MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval()
instead of eraseFromParent().

We should probably use that in other places too but fix this issue which
affects clang bootstrap builds for now.
2021-08-07 01:41:02 -07:00
Nemanja Ivanovic 62fe3dcf98 Fix PPC buildbot break caused by 4c4093e6e3
This commit adds the isnan intrinsic and provides a default expansion
for it in the SDAG. However, it makes the assumption that types
it operates on are IEEE-compliant types. This is not always the case.
An example of that is PPC "double double" which has a representation
that
- Does not need to conform to IEEE requirements for isnan as it is
  not an IEEE-compliant type
- Does not have a representation that allows for straightforward
  reinterpreting as an integer and use of integer operations

The result was that this commit broke __builtin_isnan for ppc_fp128
making many valid numeric values report a NaN.

This patch simply changes the expansion to always expand to unordered
comparison (regardless of whether FP exceptions are tracked). This
is inline with previous semantics.
2021-08-06 22:10:20 -05:00
Steffen Larsen 1b4c85fc02 [NVPTX] Add NVPTX intrinsics for CUDA PTX 6.5 ldmatrix instructions
Adds NVPTX intrinsics for the CUDA PTX `ldmatrix.sync.aligned` instructions added in PTX 6.5.

PTX ISA description of `ldmatrix.sync.aligned`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-ldmatrix

Authored-by: Steffen Larsen <steffen.larsen@codeplay.com>

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D107046
2021-08-06 16:13:35 -07:00
Sanjay Patel 0369714b31 [InstCombine] reduce vector casting before icmp
There may be some generalizations (see test comments) of these patterns,
but this should handle the cases motivated by:
https://llvm.org/PR51315
https://llvm.org/PR51259

The backend may want to transform differently, but at least for
the x86 examples that I looked at, there does not appear to be
any significant perf diff either way.
2021-08-06 17:09:38 -04:00
Michael Liao 05783e1cfe [amdgpu] Revise the conversion from i64 to f32.
- Replace 'cmp+sel' with 'umin' if possible.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D107507
2021-08-06 17:01:47 -04:00
Amara Emerson 2b067e3335 Change TargetLowering::canMergeStoresTo() to take a MF instead of DAG.
DAG is unnecessary and we need this hook to implement store merging on GlobalISel too.
2021-08-06 12:57:53 -07:00
Thomas Johnson f8a4495149 [ARC] Add codegen for llvm.ctlz intrinsic for the ARC backend
Differential Revision: https://reviews.llvm.org/D107611
2021-08-06 12:18:06 -07:00
Artem Belevich 6a9cf21f5a [CUDA, MemCpyOpt] Add a flag to force-enable memcpyopt and use it for CUDA.
Attempt to enable MemCpyOpt unconditionally in D104801 uncovered the fact that
there are users that do not expect LLVM to materialize `memset` intrinsic.

While other passes can do that, too, MemCpyOpt triggers it more frequently and
breaks sanitizers and some downstream users.

For now introduce a flag to force-enable the flag and opt-in only CUDA
compilation with NVPTX back-end.

Differential Revision: https://reviews.llvm.org/D106401
2021-08-06 11:13:52 -07:00
Zheng Chen 30b0c455b1 [LoopCacheAnalysis]: handle mismatch type for Numerator and CacheLineSize
fix an assertion due to mismatch type for Numerator and CacheLineSize in loop cache analysis pass.

Reviewed By: bmahjour

Differential Revision: https://reviews.llvm.org/D107618
2021-08-06 16:51:09 +00:00
Jon Roelofs eae4a44c1d [GlobalISel][KnownBits] Implement G_CTPOP
Implementation copied almost verbatim from ValueTracking.

Differential revision: https://reviews.llvm.org/D107606
2021-08-06 09:48:39 -07:00
Michael Liao d1cacd5928 [MemCpyOpt] Teach memcpyopt to handle loads from the constant memory.
- Loads from the constant memory (either explicit one or as the source
  of memory transfer intrinsics) won't alias any stores.

Reviewed By: asbirlea, efriedma

Differential Revision: https://reviews.llvm.org/D107605
2021-08-06 12:43:52 -04:00
Craig Topper b2ca4dc935 [LegalizeTypes] Add a simple expansion for SMULO when a libcall isn't available.
This isn't optimal, but prevents crashing when the libcall isn't
available. It just calculates the full product and makes sure the high bits
match the sign of the low half. Each of the pieces should go through their own
type legalization.

This can make D107420 unnecessary.

Needs tests, but I wanted to start discussion about D107420.

Reviewed By: FreddyYe

Differential Revision: https://reviews.llvm.org/D107581
2021-08-06 09:43:01 -07:00
David Green 77e8f4eeee [ARM] Define ComplexPatternFuncMutatesDAG
Some of the Arm complex pattern functions call canExtractShiftFromMul,
which can modify the DAG in-place. For this to be valid and handled
successfully we need to define ComplexPatternFuncMutatesDAG.

Differential Revision: https://reviews.llvm.org/D107476
2021-08-06 17:35:11 +01:00
Kazu Hirata 276be84d0a [CodeGen] Remove computeDefOperandLatency (NFC)
The last use was removed on Oct 9, 2016 in commit
5c924d7117.
2021-08-06 08:26:55 -07:00
Jay Foad 57b9107e3f [GlobalISel] Improve widening of cttz/cttz_zero_undef
Differential Revision: https://reviews.llvm.org/D107631
2021-08-06 14:25:56 +01:00
Mircea Trofin ae1a2a09e4 [NFC][MLGO] Make logging more robust
1) add some self-diagnosis (when asserts are enabled) to check that all
features have the same nr of entries

2) avoid storing pointers to mutable fields because the proto API
contract doesn't actually guarantee those stay fixed even if no further
mutation of the object occurs.

Differential Revision: https://reviews.llvm.org/D107594
2021-08-06 04:44:52 -07:00
Reshabh Sharma 5173854f19 [AMDGPU] Handle functions in llvm's global ctors and dtors list
This patch introduces a new code object metadata field, ".kind"
which is used to add support for init and fini kernels.

HSAStreamer will use function attributes, "device-init" and
"device-fini" to distinguish between init and fini kernels from
the regular kernels and will emit metadata with ".kind" set to
"init" and "fini" respectively.

To reduce the number of init and fini kernels, the ctors and
dtors present in the llvm's global.ctors and global.dtors lists
are called from a single init and fini kernel respectively.

Reviewed by: yaxunl

Differential Revision: https://reviews.llvm.org/D105682
2021-08-06 15:53:33 +05:30
Simon Pilgrim dbce6a8d9d [ARM] Fold insert_subvector to concat_vectors
D107068 fixed the same problem on aarch64 but the arm variant wasn't exposed in existing test coverage.

I've copied the arm64-neon-copy tests (and stripped the intrinsic test from it) for testing on arm neon builds as well.
2021-08-06 11:21:31 +01:00
Simon Pilgrim 18e6a03b1a [X86][AVX] Extract SUBV_BROADCAST constant bits from just the lower subvector range (PR51281)
As reported on PR51281, an internal fuzz test encountered an issue when extracting constant bits from a SUBV_BROADCAST node from a constant pool source larger than the broadcasted subvector width.

The getTargetConstantBitsFromNode was assuming that the Constant would the same size as the subvector, resulting in the incorrect packing of the per-element bits data.

This patch attempts to solve this by using the SUBV_BROADCAST node to determine the subvector width, and then ensuring we extract only the lowest bits from Constant of that subvector bitsize.

Differential Revision: https://reviews.llvm.org/D107158
2021-08-06 11:21:31 +01:00
Cullen Rhodes 08bc441174 [AArch64] NFC: drop unnecessary llvm:: namespace prefix on MCInst 2021-08-06 09:23:18 +00:00
David Sherwood 3fd96e1b2e [LoopVectorize] Improve vectorisation of some intrinsics by treating them as uniform
This patch adds more instructions to the Uniforms list, for example certain
intrinsics that are uniform by definition or whose operands are loop invariant.
This list includes:

  1. The intrinsics 'experimental.noalias.scope.decl' and 'sideeffect', which
  are always uniform by definition.
  2. If intrinsics 'lifetime.start', 'lifetime.end' and 'assume' have
  loop invariant input operands then these are also uniform too.

Also, in VPRecipeBuilder::handleReplication we check if an instruction is
uniform based purely on whether or not the instruction lives in the Uniforms
list. However, there are certain cases where calls to some intrinsics can
be effectively treated as uniform too. Therefore, we now also treat the
following cases as uniform for scalable vectors:

  1. If the 'assume' intrinsic's operand is not loop invariant, then we
  are free to treat this as uniform anyway since it's only a performance
  hint. We will get the benefit for the first lane.
  2. When the input pointers for 'lifetime.start' and 'lifetime.end' are loop
  variant then for scalable vectors we assume these still ultimately come
  from the broadcast of an alloca. We do not support scalable vectorisation
  of loops containing alloca instructions, hence the alloca itself would
  be invariant. If the pointer does not come from an alloca then the
  intrinsic itself has no effect.

I have updated the assume test for fixed width, since we now treat it
as uniform:

  Transforms/LoopVectorize/assume.ll

I've also added new scalable vectorisation tests for other intriniscs:

  Transforms/LoopVectorize/scalable-assume.ll
  Transforms/LoopVectorize/scalable-lifetime.ll
  Transforms/LoopVectorize/scalable-noalias-scope-decl.ll

Differential Revision: https://reviews.llvm.org/D107284
2021-08-06 10:13:15 +01:00
Chuanqi Xu 0fd03feb4b [FuncSpec] Return changed if function is changed by tryToReplaceWithConstant
The may get changed before specialization by RunSCCPSolver. In other
words, the pass may change the function without specialization happens.
Add test and comment to reveal this.
And it may return No Changed if the function get changed by
RunSCCPSolver before the specialization. It looks like a potential bug.

Test Plan: check-all

Reviewed By: https://reviews.llvm.org/D107622

Differential Revision: https://reviews.llvm.org/D107622
2021-08-06 17:00:17 +08:00
David Sherwood 43a5c750d1 Revert "[LoopVectorize] Add support for replication of more intrinsics with scalable vectors"
This reverts commit 95800da914.
2021-08-06 09:48:16 +01:00
Jay Foad 83610d4eb0 [AMDGPU][GlobalISel] Better legalization of 32-bit ctlz/cttz
Differential Revision: https://reviews.llvm.org/D107474
2021-08-06 09:40:48 +01:00
Jay Foad 24b67a9024 [AMDGPU][GlobalISel] Improve regbankselect for 64-bit VGPR ctlz_zero_undef/cttz_zero_undef
We can improve on the generic splitting by using ffbh/ffbl, which have a
defined result when the input is zero.

Differential Revision: https://reviews.llvm.org/D107442
2021-08-06 09:40:48 +01:00
Jay Foad d77b43c385 [AMDGPU][GlobalISel] Add G_AMDGPU_FFBL_B32
This is the counterpart to G_AMDGPU_FFBH_U32 which already exists. These
instructions have a defined result of -1 when the input is zero.

Differential Revision: https://reviews.llvm.org/D107441
2021-08-06 09:40:48 +01:00
Jay Foad cd2594e1c6 [GlobalISel] Improve legalization of narrow CTTZ
Differential Revision: https://reviews.llvm.org/D107457
2021-08-06 09:40:48 +01:00
Chuanqi Xu 62fc3e0ad6 [NFC] [FuncSpec] Remove unused variables in isArgumentInteresting 2021-08-06 16:38:20 +08:00
Chuanqi Xu cc3f40bb41 [FuncSpec] Move invariant computation for spec cost out of loop (NFC-ish)
Noticed that the computation for function specialization cost of a
function wouldn't change during the traversal of the arguments for the
function. We could hoist the computation out of the traversal. I
observed about ~1% improvement on compile time for spec2017. But I guess
it may not be precise. This should be NFC and fine.

Reviewed By: Sjoerd Meijer

Differential Revision: https://reviews.llvm.org/D107621
2021-08-06 15:43:05 +08:00
Serge Pavlov 4c4093e6e3 Introduce intrinsic llvm.isnan
This is recommit of the patch 16ff91ebcc,
reverted in 0c28a7c990 because it had
an error in call of getFastMathFlags (base type should be FPMathOperator
but not Instruction). The original commit message is duplicated below:

    Clang has builtin function '__builtin_isnan', which implements C
    library function 'isnan'. This function now is implemented entirely in
    clang codegen, which expands the function into set of IR operations.
    There are three mechanisms by which the expansion can be made.

    * The most common mechanism is using an unordered comparison made by
      instruction 'fcmp uno'. This simple solution is target-independent
      and works well in most cases. It however is not suitable if floating
      point exceptions are tracked. Corresponding IEEE 754 operation and C
      function must never raise FP exception, even if the argument is a
      signaling NaN. Compare instructions usually does not have such
      property, they raise 'invalid' exception in such case. So this
      mechanism is unsuitable when exception behavior is strict. In
      particular it could result in unexpected trapping if argument is SNaN.

    * Another solution was implemented in https://reviews.llvm.org/D95948.
      It is used in the cases when raising FP exceptions by 'isnan' is not
      allowed. This solution implements 'isnan' using integer operations.
      It solves the problem of exceptions, but offers one solution for all
      targets, however some can do the check in more efficient way.

    * Solution implemented by https://reviews.llvm.org/D96568 introduced a
      hook 'clang::TargetCodeGenInfo::testFPKind', which injects target
      specific code into IR. Now only SystemZ implements this hook and it
      generates a call to target specific intrinsic function.

    Although these mechanisms allow to implement 'isnan' with enough
    efficiency, expanding 'isnan' in clang has drawbacks:

    * The operation 'isnan' is hidden behind generic integer operations or
      target-specific intrinsics. It complicates analysis and can prevent
      some optimizations.

    * IR can be created by tools other than clang, in this case treatment
      of 'isnan' has to be duplicated in that tool.

    Another issue with the current implementation of 'isnan' comes from the
    use of options '-ffast-math' or '-fno-honor-nans'. If such option is
    specified, 'fcmp uno' may be optimized to 'false'. It is valid
    optimization in general, but it results in 'isnan' always returning
    'false'. For example, in some libc++ implementations the following code
    returns 'false':

        std::isnan(std::numeric_limits<float>::quiet_NaN())

    The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
    operands are never NaNs. This assumption however should not be applied
    to the functions that check FP number properties, including 'isnan'. If
    such function returns expected result instead of actually making
    checks, it becomes useless in many cases. The option '-ffast-math' is
    often used for performance critical code, as it can speed up execution
    by the expense of manual treatment of corner cases. If 'isnan' returns
    assumed result, a user cannot use it in the manual treatment of NaNs
    and has to invent replacements, like making the check using integer
    operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
    which also expresses the opinion, that limitations imposed by
    '-ffast-math' should be applied only to 'math' functions but not to
    'tests'.

    To overcome these drawbacks, this change introduces a new IR intrinsic
    function 'llvm.isnan', which realizes the check as specified by IEEE-754
    and C standards in target-agnostic way. During IR transformations it
    does not undergo undesirable optimizations. It reaches instruction
    selection, where is lowered in target-dependent way. The lowering can
    vary depending on options like '-ffast-math' or '-ffp-model' so the
    resulting code satisfies requested semantics.

    Differential Revision: https://reviews.llvm.org/D104854
2021-08-06 14:32:27 +07:00
Florian Hahn 3e58dd19df
[LV] Move reduction PHI node fixup to VPlan::execute (NFC).
All information to fix-up the reduction phi nodes in the vectorized loop
is available in VPlan now. This patch moves the code to do so, to make
this clearer. Fixing up the loop exit value still relies on other
information and remains outside of VPlan for now.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D100113
2021-08-06 08:29:20 +01:00
Chuanqi Xu 82ca845b47 [NFC] [FuncSpec] Update the Todo list for recursive functions
Now the recursive functions may get specialized many times when
`func-specialization-max-iters` increases. See discussion in
https://reviews.llvm.org/D106426 for details.
2021-08-06 14:43:17 +08:00
Amara Emerson 4fee756c75 Delete copy-ctor of MachineFrameInfo.
I just hit a nasty bug when writing a unit test after calling MF->getFrameInfo()
without declaring the variable as a reference.

Deleting the copy-constructor also showed a place in the ARM backend which was
doing the same thing, albeit it didn't impact correctness there from the looks of it.
2021-08-05 23:24:37 -07:00
Kai Luo 666ee849f0 [PowerPC] Fix shift amount of xxsldwi when performing vector int_to_double
POC
```
// main.c
#include <stdio.h>
#include <altivec.h>
extern vector double foo(vector int s);
int main() {
  vector int s = {0, 1, 0, 4};
  vector double vd;
  vd = foo(s);
  printf("%lf %lf\n", vd[0], vd[1]);
  return 0;
}
// poc.c
vector double foo(vector int s) {
  int x1 = s[1];
  int x3 = s[3];
  double d1 = x1;
  double d3 = x3;
  vector double x = { d1, d3 };
  return x;
}
```
Compiled with `poc.c main.c -mcpu=pwr8 -O3` on BE machine.
Current clang gives
```
4.000000 1.000000
```
while xlc gives
```
1.000000 4.000000
```
Xlc's output should be correct.

Reviewed By: shchenz, #powerpc

Differential Revision: https://reviews.llvm.org/D107428
2021-08-06 06:01:29 +00:00
Arthur Eubanks a1b21ed3fb [GCov] Emit memset instead of stores in __llvm_gcov_reset
For a very large module, __llvm_gcov_reset can become very large.
__llvm_gcov_reset previously emitted stores to a bunch of globals in one
huge basic block. MemCpyOpt would turn many of these stores into
memsets, and updating MemorySSA would be extremely slow.

Verified that this makes the compile time of certain files go down
drastically (20min -> 5min).

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D107538
2021-08-05 22:40:15 -07:00
Serge Bazanski 7ece20505f [Lanai] fix lowering wide returns
This implements LanaiTargetLowering::CanLowerReturn, thereby ensuring
all return values conform to the RetCC and get sret-demoted as
necessary.

A regression test is also added that exercises this functionality.

Reviewed By: jpienaar

Differential Revision: https://reviews.llvm.org/D107086
2021-08-05 21:08:09 -07:00
Jinsong Ji 6f84d94b9c [PowerPC] Fix copy/paste error in scalar_to_vector patterns
https://reviews.llvm.org/D100478 refactoring added a copy/paste error
for v8i16 patterns.

Reviewed By: #powerpc, shchenz

Differential Revision: https://reviews.llvm.org/D107609
2021-08-06 02:59:01 +00:00
Jessica Paquette e6a3944ea9 [AArch64][GlobalISel] Overhaul G_INSERT legalization
Similar cleanup to G_EXTRACT (51bd4e874f).

Also swap the order of clamp/widen to avoid unnecessary complex merges.

Add a bunch of missing testcases to legalize-inserts while we're at it.

Differential Revision: https://reviews.llvm.org/D107601
2021-08-05 18:28:22 -07:00
Jessica Paquette 562c8e14d9 [AArch64][GlobalISel] Widen G_IMPLICIT_DEF and G_FREEZE before clamping
Similar to other cleanup commits which widen instructions before clamping
during legalization. Purpose of this is to avoid weird type breakdowns.

In terms of G_IMPLICIT_DEF, this simplifies legalization for other instructions.
The legalizer has to emit G_IMPLICIT_DEF to legalize certain instructions, so
this can help with emitting merges elsewhere.

Differential Revision: https://reviews.llvm.org/D107604
2021-08-05 18:21:14 -07:00
Sean Fertile 23651c5ae0 [PowerPC][AIX] Create multiple constant sections.
Fixes issue where late materialized constants can be more strictly
aligned then their containing csect.

Differential Revision: https://reviews.llvm.org/D103103
2021-08-05 21:19:16 -04:00
Jon Roelofs 5fc7b1a260 Revert "[GlobalISel][KnownBits] Implement G_CTPOP"
This reverts commit ce6eb4f15a.

It's broken on the windows bots: https://reviews.llvm.org/D107606#2930121
2021-08-05 17:47:47 -07:00
Jon Roelofs ce6eb4f15a [GlobalISel][KnownBits] Implement G_CTPOP
Implementation copied almost verbatim from ValueTracking.

Differential revision: https://reviews.llvm.org/D107606
2021-08-05 17:17:29 -07:00
Ryan Prichard 623cf3dfdf Mark getc_unlocked as unavailable by default
Before D45736, getc_unlocked was available by default, but turned off
for non-Cygwin/non-MinGW Windows. D45736 then added 9 more unlocked
functions, which were unavailable by default, but it also:
 * left getc_unlocked enabled by default,
 * removed the disabling line for Windows, and
 * added code to enable getc_unlocked for GNU, Android, and OSX.

For consistency, make getc_unlocked unavailable by default. Maybe this
was the intent of D45736 anyway.

Reviewed By: MaskRay, efriedma

Differential Revision: https://reviews.llvm.org/D107527
2021-08-05 16:35:02 -07:00
Jessica Paquette 8a557d8311 [AArch64][GlobalISel] Widen extloads before clamping during legalization
Allows us to avoid awkward type breakdowns on types like s88, like the other
commits.

Differential Revision: https://reviews.llvm.org/D107587
2021-08-05 16:14:06 -07:00
Stanislav Mekhanoshin d71924fbfe [AMDGPU] Improve v2i32/v2f32 insertelt patterns
Using REG_SEQUENCE produces better code than INSERT_SUBREG,
we can omit one move instruction in many cases.

Fixes: SWDEV-298028

Differential Revision: https://reviews.llvm.org/D107602
2021-08-05 16:13:39 -07:00
Heejin Ahn 41ba39dfcd [WebAssembly] Don't do SjLj transformation when there's only setjmp
When there is a `setjmp` call in a function, we transform every callsite
of `setjmp` to record its information by calling `saveSetjmp` function,
and we also transform every callsite of a function that can longjmp to
to check if a longjmp occurred and if so jump to the corresponding
post-setjmp BB. Currently we are doing this for every function that
contains a call to `setjmp`, but if there is no other function call
within that function that can longjmp, this transformation of `setjmp`
callsite and all the preparation of `setjmpTable` in the entry of the
function are not necessary.

This checks if a setjmp-calling function has any other calls that can
longjmp, and if not, skips the function for the SjLj transformation.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D107530
2021-08-05 15:28:02 -07:00
David Green 649cf4514d [AArch64] Expand the SVE min/max reduction costs to NEON
This takes the existing SVE costing for the various min/max reduction
intrinsics and expands it to NEON, where I believe it applies equally
well.

In the process it changes the lowering to use min/max cost, as opposed
to summing up the cost of ICmp+Select.

Differential Revision: https://reviews.llvm.org/D106239
2021-08-05 23:23:24 +01:00
Jessica Paquette 36498374d4 [AArch64][GlobalISel] Widen G_BSWAP before clamping
This allows us to avoid odd type breakdowns + allows us to legalize types like
s88 in the first place.

Add some testcases for known legal types + testcases for s4 and s88.

Differential Revision: https://reviews.llvm.org/D107607
2021-08-05 15:16:00 -07:00
Jessica Paquette 51bd4e874f [AArch64][GlobalISel] Overhaul G_EXTRACT legalization
This simplifies our existing G_EXTRACT rules and adds some test coverage. Mostly
changing this because it should make it easier to improve legalization for
instructions which use G_EXTRACT as part of the legalization process.

This also adds support for legalizing some weird types. Similar to other recent
legalizer changes, this changes the order of widening/clamping.

There was some dead code in our existing rules (e.g. the p0 case would never get
hit), so this knocks those out and makes the types we want to handle explicit.

This also removes some checks which, nowadays, are handled by the
MachineVerifier.

Differential Revision: https://reviews.llvm.org/D107505
2021-08-05 13:55:15 -07:00
Jon Roelofs 98f38c151b [AArch64][GlobalISel] Legalize ctpop s128
This is re-landing the same patch again, but without the changes to
LegalizerHelper that regressed the Mips test:

test/CodeGen/Mips/GlobalISel/llvm-ir/ctpop.ll

Differential revision: https://reviews.llvm.org/D106494
2021-08-05 11:54:53 -07:00
Chris Jackson 113a06f7a5 {DebugInfo][LSR] Don't cache dbg.value that are already undef
The SCEV-based salvaging method caches dbg.value information pre-LSR so
that salvaging may be attempted post-LSR. If the dbg.value are already
undef pre-LSR then a salvage attempt would be fruitless, so avoid
caching them.

Reviewed By: StephenTozer

Differential Revision: https://reviews.llvm.org/D107448
2021-08-05 19:16:43 +01:00
Roman Lebedev c0586ff05d
[NFC][X86] combineX86ShuffleChain(): hoist Mask variable higher up
Having `NewMask` outside of an if and rebinding `BaseMask` `ArrayRef`
to it is confusing. Instead, just move the `Mask` vector higher up,
and change the code that earlier had no access to it but now does
to use `Mask` instead of `BaseMask`.

This has no other intentional changes.

This is a recommit of 35c0848b57,
that was reverted to simplify reversion of an earlier change.
2021-08-05 20:37:51 +03:00
Ramesh Peri 976bd23612 [llvm-ar] Fix for handling thin archive with SYM64 and a test case for it
WHen thin archives are created which have symbol table of type SYM64 then all the tools will not work since they cannot read the files properly.
One can reproduce the problem as follows:
1. Take a hello world program and create an archive out of it. The SYM64_THRESHOLD=0 will force the generation of SYM64 symbol table.
    clang -c hello.cpp
    SYM64_THRESHOLD=0 llvm-ar crsT mylib.a hello.o
2. Now try to use any of the tools on this mylib.a and it will fail.
    llvm-nm -M mylib.a

THis fix will eliminate these failures. A regression test is created in llvm/test/Object/archive-symtab.test

Reviewed By: MaskRay, Ramesh

Differential Revision: https://reviews.llvm.org/D107322
2021-08-05 10:06:34 -07:00
Benjamin Kramer bd17ced1db Revert "[X86] combineX86ShuffleChain(): canonicalize mask elts picking from splats"
This reverts commits f819e4c7d0 and
35c0848b57. It triggers an infinite loop during
compilation.

$ cat t.ll
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define void @MaxPoolGradGrad_1.65() local_unnamed_addr #0 {
entry:
  %wide.vec78 = load <64 x i32>, <64 x i32>* null, align 16
  %strided.vec83 = shufflevector <64 x i32> %wide.vec78, <64 x i32> poison, <8 x i32> <i32 4, i32 12, i32 20, i32 28, i32 36, i32 44, i32 52, i32 60>
  %0 = lshr <8 x i32> %strided.vec83, <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>
  %1 = add <8 x i32> zeroinitializer, %0
  %2 = shufflevector <8 x i32> %1, <8 x i32> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
  %3 = shufflevector <16 x i32> %2, <16 x i32> undef, <32 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
  %interleaved.vec = shufflevector <32 x i32> undef, <32 x i32> %3, <64 x i32> <i32 0, i32 8, i32 16, i32 24, i32 32, i32 40, i32 48, i32 56, i32 1, i32 9, i32 17, i32 25, i32 33, i32 41, i32 49, i32 57, i32 2, i32 10, i32 18, i32 26, i32 34, i32 42, i32 50, i32 58, i32 3, i32 11, i32 19, i32 27, i32 35, i32 43, i32 51, i32 59, i32 4, i32 12, i32 20, i32 28, i32 36, i32 44, i32 52, i32 60, i32 5, i32 13, i32 21, i32 29, i32 37, i32 45, i32 53, i32 61, i32 6, i32 14, i32 22, i32 30, i32 38, i32 46, i32 54, i32 62, i32 7, i32 15, i32 23, i32 31, i32 39, i32 47, i32 55, i32 63>
  store <64 x i32> %interleaved.vec, <64 x i32>* undef, align 16
  unreachable
}

$ llc < t.ll -mcpu=skylake
<hang>
2021-08-05 18:58:08 +02:00
Jessica Paquette f3f3098afe [AArch64][GlobalISel] Mark v16s8 <- v8s8, v8s8 G_CONCAT_VECTOR as legal
G_CONCAT_VECTORS shows up from time to time when legalizing other instructions.

We actually import patterns for the v16s8 <- v8s8, v8s8 case so marking it
as legal gives us selection for free.

Differential Revision: https://reviews.llvm.org/D107512
2021-08-05 09:40:46 -07:00
Kazu Hirata 72661f337a [Transforms] Drop unnecessary const from return types (NFC)
Identified with readability-const-return-type.
2021-08-05 08:53:17 -07:00
Alexey Bataev e7c3eaa8ae [SLP]Do not emit extra shuffle for insertelements vectorization.
If the vectorized insertelements instructions form indentity subvector
(the subvector at the beginning of the long vector), it is just enough
to extend the vector itself, no need to generate inserting subvector
shuffle.

Differential Revision: https://reviews.llvm.org/D107494
2021-08-05 08:41:24 -07:00
Craig Topper f7076cfd3a [DAGCombiner][RISCV][AMDGPU] Call SimplifyDemandedBits at the end of visitMULHU to enable known bits contant folding.
We don't have real demanded bits support for MULHU, but we can
still use the known bits based constant folding support at the end
of SimplifyDemandedBits to simplify a MULHU. This helps with cases
where we know the LHS and RHS have enough leading zeros so that
the high multiply result is always 0.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D106471
2021-08-05 08:31:26 -07:00
David Sherwood e9177b0958 Fix build issues caused by 95800da914 2021-08-05 16:26:34 +01:00
Sander de Smalen 3e47f009ff [LV] Consider ExtractValue as uniform.
Since all operands to ExtractValue must be loop-invariant when we deem
the loop vectorizable, we can consider ExtractValue to be uniform.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D107286
2021-08-05 16:20:50 +01:00
eopXD fd7f6a3c81 [NFC][LoopIdiom] rename boolean variable NegStride to IsNegStride
Rename variable for better code readability.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D107570
2021-08-05 23:11:42 +08:00
Jay Foad 2b63933115 [AMDGPU][SDag] Better lowering for 32-bit ctlz/cttz
Differential Revision: https://reviews.llvm.org/D107566
2021-08-05 15:57:40 +01:00
Jay Foad e6c364a624 [AMDGPU][SDag] Better lowering for 64-bit ctlz/cttz
Differential Revision: https://reviews.llvm.org/D107546
2021-08-05 15:57:40 +01:00
Momchil Velikov f171149e0d [SimpifyCFG] Speculate a store preceded by a local non-escaping load
In SimplifyCFG we may simplify the CFG by speculatively executing
certain stores, when they are preceded by a store to the same
location.  This patch allows such speculation also when the stores are
similarly preceded by a load.

In order for this transformation to be correct we need to ensure that
the memory location is writable and the store in the new location does
not introduce a data race.

Local objects (created by an `alloca` instruction) are always
writable, so once we are past a read from a location it is valid to
also write to that same location.

Seeing just a load does not guarantee absence of a data race (unlike
if we see a store) - the load may still be part of a race, just not
causing undefined behaviour
(cf. https://llvm.org/docs/Atomics.html#optimization-outside-atomic).

In the original program, a data race might have been prevented by the
condition, but once we move the store outside the condition, we must
be sure a data race wasn't possible anyway, no matter what the
condition evaluates to.

One way to be sure that a local object is never concurrently
read/written is check that its address never escapes the function.

Hence this transformation is restricted to local, non-escaping
objects.

Reviewed By: nikic, lebedev.ri

Differential Revision: https://reviews.llvm.org/D107281
2021-08-05 15:54:42 +01:00
Simon Pilgrim 2cbf9fd402 [DAG] DAGCombiner::visitVECTOR_SHUFFLE - recognise INSERT_SUBVECTOR patterns
IR typically creates INSERT_SUBVECTOR patterns as a widening of the subvector with undefs to pad to the destination size, followed by a shuffle for the actual insertion - SelectionDAGBuilder has to do something similar for shuffles when source/destination vectors are different sizes.

This combine attempts to recognize these patterns by looking for a shuffle of a subvector (from a CONCAT_VECTORS) that starts at a modulo of its size into an otherwise identity shuffle of the base vector.

This uncovered a couple of target-specific issues as we haven't often created INSERT_SUBVECTOR nodes in generic code - aarch64 could only handle insertions into the bottom of undefs (i.e. a vector widening), and x86-avx512 vXi1 insertion wasn't keeping track of undef elements in the base vector.

Fixes PR50053

Differential Revision: https://reviews.llvm.org/D107068
2021-08-05 15:40:48 +01:00
Florian Hahn 38b098be66
[VectorCombine] Limit scalarization known non-poison indices.
We can only trust the range of the index if it is guaranteed
non-poison.

Fixes PR50949.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D107364
2021-08-05 15:36:31 +01:00
Dawid Jurczak 06206a8cd1 [BuildLibCalls][NFC] Remove redundant attribute list from emitCalloc
Additionally with this patch aligned DSE which is the only user of emitCalloc.

Differential Revision: https://reviews.llvm.org/D103523
2021-08-05 16:18:38 +02:00
David Sherwood 95800da914 [LoopVectorize] Add support for replication of more intrinsics with scalable vectors
This patch adds more instructions to the Uniforms list, for example certain
intrinsics that are uniform by definition or whose operands are loop invariant.
This list includes:

  1. The intrinsics 'experimental.noalias.scope.decl' and 'sideeffect', which
  are always uniform by definition.
  2. If intrinsics 'lifetime.start', 'lifetime.end' and 'assume' have
  loop invariant input operands then these are also uniform too.

Also, in VPRecipeBuilder::handleReplication we check if an instruction is
uniform based purely on whether or not the instruction lives in the Uniforms
list. However, there are certain cases where calls to some intrinsics can
be effectively treated as uniform too. Therefore, we now also treat the
following cases as uniform for scalable vectors:

  1. If the 'assume' intrinsic's operand is not loop invariant, then we
  are free to treat this as uniform anyway since it's only a performance
  hint. We will get the benefit for the first lane.
  2. When the input pointers for 'lifetime.start' and 'lifetime.end' are loop
  variant then for scalable vectors we assume these still ultimately come
  from the broadcast of an alloca. We do not support scalable vectorisation
  of loops containing alloca instructions, hence the alloca itself would
  be invariant. If the pointer does not come from an alloca then the
  intrinsic itself has no effect.

I have updated the assume test for fixed width, since we now treat it
as uniform:

  Transforms/LoopVectorize/assume.ll

I've also added new scalable vectorisation tests for other intriniscs:

  Transforms/LoopVectorize/scalable-assume.ll
  Transforms/LoopVectorize/scalable-lifetime.ll
  Transforms/LoopVectorize/scalable-noalias-scope-decl.ll

Differential Revision: https://reviews.llvm.org/D107284
2021-08-05 15:17:27 +01:00
Dawid Jurczak f8cdde7195 [SimplifyLibCalls][NFC] Clean up LibCallSimplifier from 'memset + malloc into calloc' transformation
FoldMallocMemset can be safely removed because since https://reviews.llvm.org/D103009
such transformation is already performed in DSE.

Differential Revision: https://reviews.llvm.org/D103451
2021-08-05 16:08:32 +02:00
Krzysztof Parzyszek d0c3b61498 Delay initialization of OptBisect
When LLVM is used in other projects, it may happen that global cons-
tructors will execute before the call to ParseCommandLineOptions.
Since OptBisect is initialized via a constructor, and has no ability
to be updated at a later time, passing "-opt-bisect-limit" to the
parse function may have no effect.

To avoid this problem use a cl::cb (callback) to set the bisection
limit when the option is actually processed.

Differential Revision: https://reviews.llvm.org/D104551
2021-08-05 09:04:17 -05:00
Bardia Mahjour 0e08891ec1 [DA] control compile-time spent by MIV tests
Function exploreDirections() in DependenceAnalysis implements a recursive
algorithm for refining direction vectors. This algorithm has worst-case
complexity of O(3^(n+1)) where n is the number of common loop levels.
In this patch I'm adding a threshold to control the amount of time we
spend in doing MIV tests (which most of the time end up resulting in over
pessimistic direction vectors anyway).

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D107159
2021-08-05 09:50:11 -04:00
Sander de Smalen 8d08a84745 [LV] Remove a change that was added in D106164.
This change wasn't strictly necessary for D106164 and could be removed.
This patch addresses the post-commit comments from @fhahn on D106164, and
also changes sve-widen-gep.ll to use the same IR test as shown in
pointer-induction.ll.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D106878
2021-08-05 14:44:53 +01:00
Paul Robinson 75aa3d520d Add a DIExpression const-folder to prevent silly expressions.
It's entirely possible (because it actually happened) for a bool
variable to end up with a 256-bit DW_AT_const_value.  This came about
when a local bool variable was initialized from a bitfield in a
32-byte struct of bitfields, and after inlining and constant
propagation, the variable did have a constant value. The sequence of
optimizations had it carrying "i256" values around, but once the
constant made it into the llvm.dbg.value, no further IR changes could
affect it.

Technically the llvm.dbg.value did have a DIExpression to reduce it
back down to 8 bits, but the compiler is in no way ready to emit an
oversized constant *and* a DWARF expression to manipulate it.
Depending on the circumstances, we had either just the very fat bool
value, or an expression with no starting value.

The sequence of optimizations that led to this state did seem pretty
reasonable, so the solution I came up with was to invent a DWARF
constant expression folder.  Currently it only does convert ops, but
there's no reason it couldn't do other ops if that became useful.

This broke three tests that depended on having convert ops survive
into the DWARF, so I added an operator that would abort the folder to
each of those tests.

Differential Revision: https://reviews.llvm.org/D106915
2021-08-05 06:14:40 -07:00
Petar Avramovic 66de26b1f9 GlobalISel: Fix matchEqualDefs for instructions with multiple defs
Instructions that produceSameValue produce same values for operands with
same index. matchEqualDefs used to return true for any two values from
different instructions that produce same values. Fix this by checking if
values are defined by operands with the same index.

Differential Revision: https://reviews.llvm.org/D107362
2021-08-05 15:05:45 +02:00
Simon Pilgrim e78bf49a58 [X86] Rename Subtarget Tuning Feature Flag Prefix. NFC.
As suggested on D107370, this patch renames the tuning feature flags to start with 'Tuning' instead of 'Feature'.

Differential Revision: https://reviews.llvm.org/D107459
2021-08-05 13:09:23 +01:00
Dominik Montada cc947e29ea [GlobalISel] Combine shr(shl x, c1), c2 to G_SBFX/G_UBFX
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D107330
2021-08-05 13:52:10 +02:00
Fraser Cormack 0b8471e91b [SelectionDAG] Correctly determine the VECREDUCE_SEQ_FMUL action
The LegalizeAction for this node should follow the logic for
`VECREDUCE_SEQ_FADD` and be determined using the vector operand's type.

here isn't an in-tree target that makes use of this, but I think it's safe to
say this is how it should behave, should a target want to customize the action
for this node.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D107478
2021-08-05 09:42:33 +01:00
Jay Foad e790b2b744 [AMDGPU] Make more use of getHiHalf64 and split64BitValue. NFCI. 2021-08-05 09:36:13 +01:00
Igor Kudrin 2c14798ead [ARM][llvm-objdump] Annotate PC-relative memory operands of VLDR instructions
This extends D105979 and adds support for VLDR instructions.

Differential Revision: https://reviews.llvm.org/D105980
2021-08-05 14:11:11 +07:00
Igor Kudrin ddbe812bcc [ARM][llvm-objdump] Annotate PC-relative memory operands
This implements `MCInstrAnalysis::evaluateMemoryOperandAddress()` for
Arm so that the disassembler can print the target address of memory
operands that use PC+immediate addressing.

Differential Revision: https://reviews.llvm.org/D105979
2021-08-05 14:11:11 +07:00
eopXD 26aa1bbe97 [NFCI] [LoopIdiom] Let processLoopStridedStore take StoreSize as SCEV instead of unsigned
Letting it take SCEV allows further modification on the function to optimize
if the StoreSize / Stride is runtime determined.

This is a preceeding of D107353.
The big picture is to let LoopIdiom deal with runtime-determined sizes.

Reviewed By: Whitney, lebedev.ri

Differential Revision: https://reviews.llvm.org/D104595
2021-08-05 13:21:48 +08:00
Heejin Ahn aa0b0fbbe6 [WebAssembly] Use `SDValue::getConstantOperandVal` (NFC)
Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D107499
2021-08-04 21:15:23 -07:00
Nathan Lanza 5848166369 Disable LibFuncs for stpcpy and stpncpy for Android < 21
These functions don't exist in android API levels < 21. A change in
llvm-12 (rG6dbf0cfcf789) caused Oz builds to emit this symbol assuming
it's available and thus is causing link errors. Simply disable it here.

Differential Revision: https://reviews.llvm.org/D107509
2021-08-04 22:48:41 -04:00
Matt Jacobson 75abeb64ce [AVR] emit 'MCSA_Global' references to '__do_global_ctors' and '__do_global_dtors'
Emit references to '__do_global_ctors' and '__do_global_dtors' to allow
constructor/destructor routines to run.

Reviewed by: MaskRay

Differential Revision: https://reviews.llvm.org/D107133
2021-08-05 10:37:36 +08:00
modimo 041b525141 [CSSPGO] Remove used of PseudoProbeAttributes::Reserved
D106861 added usage of PseudoProbeAttributes::Reserved as TailCall however this usage hasn't been committed/reviewed. Removing this usage.

Testing
ninja check-all

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D107514
2021-08-04 17:23:56 -07:00
Yonghong Song e52946b9ab BPF: avoid NE/EQ loop exit condition
Kuniyuki Iwashima reported in [1] that llvm compiler may
convert a loop exit condition with "i < bound" to "i != bound", where
"i" is the loop index variable and "bound" is the upper bound.
In case that "bound" is not a constant, verifier will always have "i != bound"
true, which will cause verifier failure since to verifier this is
an infinite loop.

The fix is to avoid transforming "i < bound" to "i != bound".
In llvm, the transformation is done by IndVarSimplify pass.
The compiler checks loop condition cost (i = i + 1) and if the
cost is lower, it may transform "i < bound" to "i != bound".
This patch implemented getArithmeticInstrCost() in BPF TargetTransformInfo
class to return a higher cost for such an operation, which
will prevent the transformation for the test case
added in this patch.

 [1] https://lore.kernel.org/netdev/1994df05-8f01-371f-3c3b-d33d7836878c@fb.com/

Differential Revision: https://reviews.llvm.org/D107483
2021-08-04 16:54:16 -07:00
Jessica Paquette ca2e053652 [AArch64][GlobalISel] Legalize wide vector G_PHIs
Clamp the max number of elements when legalizing G_PHI. This allows us to
legalize some common fallbacks like 4 x s64.

Here's an example: https://godbolt.org/z/6YocsEYTd

Had to add -global-isel-abort=0 to legalize-phi.mir to account for the
G_EXTRACT_VECTOR_ELT from the 32 x s8 G_PHI.

Differential Revision: https://reviews.llvm.org/D107508
2021-08-04 16:48:59 -07:00
Heejin Ahn 31a71a393f [WebAssembly] Make result of 'catch' inst variadic
`catch` instruction can have any number of result values depending on
its tag, but so far we have only needed a single i32 return value for
C++ exception so the instruction was specified that way. But using the
instruction for SjLj handling requires multiple return values.

This makes `catch` instruction's results variadic and moves selection of
`throw` and `catch` instruction from ISelLowering to ISelDAGToDAG.
Moving `catch` to ISelDAGToDAG is necessary because I am not aware of
a good way to do instruction selection for variadic output instructions
in TableGen. This also moves `throw` because 1. `throw` and `catch`
share the same utility function and 2. there is really no reason we
should do that in ISelLowering in the first place. What we do is mostly
the same in both places, and moving them to ISelDAGToDAG allows us to
remove unnecessary mid-level nodes for `throw` and `catch` in
WebAssemblyISD.def and WebAssemblyInstrInfo.td.

This also adds handling for new `catch` instruction to AsmTypeCheck.

Reviewed By: dschuff, tlively

Differential Revision: https://reviews.llvm.org/D107423
2021-08-04 14:05:33 -07:00
Fangrui Song 9c19b36f1c [X86] Remove -x86-experimental-pref-loop-alignment in favor of -align-loops 2021-08-04 13:23:57 -07:00
Fangrui Song a194438615 [CodeGen] Add -align-loops
to `lib/CodeGen/CommandFlags.cpp`. It can replace
-x86-experimental-pref-loop-alignment=.

The loop alignment is only used by MachineBlockPlacement.
The implementation uses a new `llvm::TargetOptions` for now, as
an IR function attribute/module flags metadata may be overkill.

This is the llvm part of D106701.
2021-08-04 12:45:18 -07:00
Michael Liao 5edc886e90 [amdgpu] Add an enhanced conversion from i64 to f32.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D107187
2021-08-04 15:33:12 -04:00
Nikita Popov bb15861e14 [MemCpyOpt] Relax libcall checks
Rather than blocking the whole MemCpyOpt pass if the libcalls are
not available, only disable creation of new memset/memcpy intrinsics
where only load/stores were used previously. This only affects the
store merging and load-store conversion optimization. Other
optimizations are derived from existing intrinsics, which are
well-defined in the absence of libcalls -- not having the libcalls
just means that call simplification won't convert them to intrinsics.

This is a weaker variation of D104801, which dropped these checks
entirely. Ideally we would not couple emission of intrinsics to
libcall availability at all, but as the intrinsics may be legalized
to libcalls we need to be a bit careful right now.

Differential Revision: https://reviews.llvm.org/D106769
2021-08-04 21:17:51 +02:00
Giorgis Georgakoudis 29a3e3dd7b [OpenMPOpt] Expand SPMDization with guarding for target parallel regions
This patch expands SPMDization (converting generic execution mode to SPMD for target regions) by guarding code regions that should be executed only by the main thread. Specifically, it generates guarded regions, which only the main thread executes, and the synchronization with worker threads using simple barriers. For correctness, the patch aborts SPMDization for target regions if the same code executes in a parallel region, thus must be not be guarded. This check is implemented using the ParallelLevels AA.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D106892
2021-08-04 11:49:24 -07:00
Craig Topper c23405174a [DAGCombiner][AMDGPU] Canonicalize constants to the RHS of MULHU/MULHS.
This allows special constants like to 0 to be recognized. It's also
expected by isel patterns if a target had a mulh with immediate instructions.
The commuting done by tablegen won't commute patterns with immediates since it
expects DAGCombine to have done it.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D107486
2021-08-04 11:39:23 -07:00
Alexey Bataev 214f99b27c Revert "[SLP]Do not emit extra shuffle for insertelements vectorization."
This reverts commit 871ea69803 to fix the
problem if the first vector is not just undef.
2021-08-04 11:28:59 -07:00
Reshabh Sharma dce35ef104 Revert "[AMDGPU] Handle functions in llvm's global ctors and dtors list"
This reverts commit d42e70b3d3.
2021-08-04 23:33:31 +05:30
Dawid Jurczak 238139be09 [DSE][NFC] Clean up DeadStoreElimination from unused variables
Differential Revision: https://reviews.llvm.org/D106446
2021-08-04 19:44:40 +02:00
Craig Topper 643ce70a64 [RISCV] Remove the _COMMUTABLE and _TA versions of FMA and wide FMA vector instructions.
Use a tail policy operand instead. Inspired by the work in D105092,
but without the intrinsic interface changes.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D106512
2021-08-04 10:39:50 -07:00
Jessica Paquette d9279843b1 [AArch64][GlobalISel] Widen G_PHI before clamping it during legalization
This allows us to handle weird types like s88; we first widen to s128, then
clamp back down to s64.

https://godbolt.org/z/9xqbP46Mz

Also this makes it possible for GISel to legalize the case in pr48188.ll. It
now does the same thing as SDAG, although regalloc chooses different registers.

Differential Revision: https://reviews.llvm.org/D107417
2021-08-04 10:25:14 -07:00
Jessica Paquette 7d97de60b3 [AArch64][GlobalISel] Widen G_FPTO*I before clamping
Going through our legalization rules and doing some cleanup.

Widening and then clamping is usually easier than clamping and then widening.

This allows us to legalize some weird types like s88.

Differential Revision: https://reviews.llvm.org/D107413
2021-08-04 10:19:26 -07:00
Petr Hosek 6660cec568 [InstrProfiling] Emit bias variable eagerly
Rather than emitting the bias variable lazily as needed, emit it
eagerly. This allows profile runtime to refer to this variable
unconditionally without having to use the weak reference. The bias
variable is in a COMDAT so there'll never be more than one instance,
and if it's not needed, linker should be able to GC it, so the overhead
should be minimal.

Differential Revision: https://reviews.llvm.org/D107377
2021-08-04 10:17:08 -07:00
Andrea Di Biagio 7a1a35a1d1 [X86][SchedModel] Add missing ReadAdvance for some arithmetic ops (PR51318 and PR51322).
This fixes a bug where implicit uses of EFLAGS were not marked as ReadAdvance in
the RM/MR variants of ADC/SBB (PR51318)

This also fixes the absence of ReadAdvance for the register operand of
RMW arithmetic instructions (PR51322).

Differential Revision: https://reviews.llvm.org/D107367
2021-08-04 17:50:22 +01:00
jamesluox ee7d20e846 [CSSPGO] Migrate and refactor the decoder of Pseudo Probe
Migrate pseudo probe decoding logic in llvm-profgen to MC, so other LLVM-base program could reuse existing codes. Redesign object layout of encoded and decoded pseudo probes.

Reviewed By: hoy

Differential Revision: https://reviews.llvm.org/D106861
2021-08-04 09:21:34 -07:00
Sander de Smalen fe6ae81ef3 [InstCombine] Fix vscale zext/sext optimization when vscale_range is unbounded.
According to the LangRef, a (vscale_range) value of 0 means unbounded.

This patch additionally cleans up the test file vscale_sext_and_zext.ll.
2021-08-04 17:17:37 +01:00
Bradley Smith d9cc5d84e4 [AArch64][SVE] Combine bitcasts of predicate types with vector inserts/extracts of loads/stores
An insert subvector that is inserting the result of a vector predicate
sized load into undef at index 0, whose result is casted to a predicate
type, can be combined into a direct predicate load. Likewise the same
applies to extract subvector but in reverse.

The purpose of this optimization is to clean up cases that will be
introduced in a later patch where casts to/from predicate types from i8
types will use insert subvector, rather than going through memory early.

This optimization is done in SVEIntrinsicOpts rather than InstCombine to
re-introduce scalable loads as late as possible, to give other
optimizations the best chance possible to do a good job.

Differential Revision: https://reviews.llvm.org/D106549
2021-08-04 15:51:14 +00:00
Simon Wallis 9269752671 [AArch64] Fix assert AArch64TargetLowering::ReplaceNodeResults
Don't know how to custom expand this
UNREACHABLE executed at llvm-project/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16788

The fix is to provide missing expansions for:
  case ISD::STRICT_FP_TO_UINT:
  case ISD::STRICT_FP_TO_SINT:

A test case is provided.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D107452
2021-08-04 16:18:19 +01:00
Chris Jackson 21ee38e24f [DebugInfo][LSR] Avoid crashes on large integer inputs
SCEV-based salvaging in LSR translates SCEVs to DIExpressions. SCEVs may
contain very large integers but the translation does not support
integers greater than 64 bits. This patch adds checks to ensure
conversions of these large integers is not attempted. A regression test
is added to ensure no such translation is attempted.

Reviewed by: StephenTozer

PR: https://bugs.llvm.org/show_bug.cgi?id=51329

Differential Revision: https://reviews.llvm.org/D107438
2021-08-04 15:51:22 +01:00
Reshabh Sharma d42e70b3d3 [AMDGPU] Handle functions in llvm's global ctors and dtors list
This patch introduces a new code object metadata field, ".kind"
which is used to add support for init and fini kernels.

HSAStreamer will use function attributes, "device-init" and
"device-fini" to distinguish between init and fini kernels from
the regular kernels and will emit metadata with ".kind" set to
"init" and "fini" respectively.

To reduce the number of init and fini kernels, the ctors and
dtors present in the llvm's global.ctors and global.dtors lists
are called from a single init and fini kernel respectively.

Reviewed by: yaxunl

Differential Revision: https://reviews.llvm.org/D105682
2021-08-04 19:53:33 +05:30
Roman Lebedev 35c0848b57
[NFC][X86] combineX86ShuffleChain(): hoist Mask variable higher up
Having `NewMask` outside of an if and rebinding `BaseMask` `ArrayRef`
to it is confusing. Instead, just move the `Mask` vector higher up,
and change the code that earlier had no access to it but now does
to use `Mask` instead of `BaseMask`.

This has no other intentional changes.
2021-08-04 17:15:12 +03:00
Roman Lebedev 916cdc3d4b
[NFC][X86] combineX86ShuffleChain(): rename inner Mask to avoid future shadowing
I want to hoist `Mask` variable higher up,
but then it would clash with this one.
So let's rename this one first.

There are no other intentional changes here other than said rename.
2021-08-04 17:15:12 +03:00
Tomas Matheson 40650f27b5 [ARM][atomicrmw] Fix CMP_SWAP_32 expand assert
This assert is intended to ensure that the high registers are not
selected when it is passed to one of the thumb UXT instructions. However
it was triggering even for 32 bit where no UXT instruction is emitted.

Fixes PR51313.

Differential Revision: https://reviews.llvm.org/D107363
2021-08-04 15:02:02 +01:00
Roman Lebedev f819e4c7d0
[X86] combineX86ShuffleChain(): canonicalize mask elts picking from splats
Given a shuffle mask, if it is picking from an input that is splat
given the current granularity of the shuffle, then adjust the mask
to pick from the same lane of the input as the mask element is in.
This may result in a shuffle being simplified into a blend.

I believe this is correct given that the splat detection matches the one
just above the new code,

My basic thought is that we might be able to get less regressions
by handling multiple insertions of the same value into a vector
if we form broadcasts+blend here, as opposed to D105390,
but i have not really thought this through,
and did not try implementing it yet.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D107009
2021-08-04 16:55:04 +03:00
David Green eeddcba525 [RDA] Attempt to make RDA subreg aware
This attempts to make more of RDA aware of potentially overlapping
subregisters. Some of this was already in place, with it iterating
through MCRegUnitIterators. This also replaces calls to
LiveRegs.contains(..) with !LiveRegs.available(..), and updates the
isValidRegUseOf and isValidRegDefOf to search subregs.

Differential Revision: https://reviews.llvm.org/D107351
2021-08-04 14:21:32 +01:00
Simon Pilgrim 8cd40ece70 [X86] Rename X86 tuning feature flag FeatureHasFastGather -> FeatureFastGather
Match the naming style used by the other 'FeatureFast/FeatureSlow' tuning flags.
2021-08-04 13:07:50 +01:00
Simon Pilgrim 17e8ac0703 [X86] Move FeatureFastBEXTR from bdver2 features to tuning
Noticed while looking at the feature flag renaming suggested in D107370
2021-08-04 13:07:49 +01:00
Serge Pavlov 0c28a7c990 Revert "Introduce intrinsic llvm.isnan"
This reverts commit 16ff91ebcc.
Several errors were reported mainly test-suite execution time. Reverted
for investigation.
2021-08-04 17:18:15 +07:00
Simon Pilgrim fc8dee1ebb [X86] Split Subtarget ISA / Security / Tuning Feature Flags Definitions. NFC
Our list of slow/fast tuning feature flags has become pretty extensive and is randomly interleaved with ISA and Security (Retpoline etc.) flags, not even based on when the ISAs/flags were introduced, making it tricky to locate them. Plus we started treating tuning flags separately some time ago, so this patch tries to group the flags to match.

I've left them mostly in the same order within each group - I'm happy to rearrange them further if there are specific ISA or Tuning flags that you think should be kept closer together.

Differential Revision: https://reviews.llvm.org/D107370
2021-08-04 11:16:36 +01:00
Tim Northover d7b0e5525a X86: fix frame offset calculation with mandatory tail calls
If there's a region of the stack reserved for potential tail call arguments
(only the case when we guarantee tail calls will be honoured), this is right
next to the incoming stored return address, not necessarily next to the
callee-saved area, so combining the two into a single figure leads to incorrect
offsets in some edge cases.
2021-08-04 10:02:42 +01:00
Serge Pavlov 16ff91ebcc Introduce intrinsic llvm.isnan
Clang has builtin function '__builtin_isnan', which implements C
library function 'isnan'. This function now is implemented entirely in
clang codegen, which expands the function into set of IR operations.
There are three mechanisms by which the expansion can be made.

* The most common mechanism is using an unordered comparison made by
  instruction 'fcmp uno'. This simple solution is target-independent
  and works well in most cases. It however is not suitable if floating
  point exceptions are tracked. Corresponding IEEE 754 operation and C
  function must never raise FP exception, even if the argument is a
  signaling NaN. Compare instructions usually does not have such
  property, they raise 'invalid' exception in such case. So this
  mechanism is unsuitable when exception behavior is strict. In
  particular it could result in unexpected trapping if argument is SNaN.

* Another solution was implemented in https://reviews.llvm.org/D95948.
  It is used in the cases when raising FP exceptions by 'isnan' is not
  allowed. This solution implements 'isnan' using integer operations.
  It solves the problem of exceptions, but offers one solution for all
  targets, however some can do the check in more efficient way.

* Solution implemented by https://reviews.llvm.org/D96568 introduced a
  hook 'clang::TargetCodeGenInfo::testFPKind', which injects target
  specific code into IR. Now only SystemZ implements this hook and it
  generates a call to target specific intrinsic function.

Although these mechanisms allow to implement 'isnan' with enough
efficiency, expanding 'isnan' in clang has drawbacks:

* The operation 'isnan' is hidden behind generic integer operations or
  target-specific intrinsics. It complicates analysis and can prevent
  some optimizations.

* IR can be created by tools other than clang, in this case treatment
  of 'isnan' has to be duplicated in that tool.

Another issue with the current implementation of 'isnan' comes from the
use of options '-ffast-math' or '-fno-honor-nans'. If such option is
specified, 'fcmp uno' may be optimized to 'false'. It is valid
optimization in general, but it results in 'isnan' always returning
'false'. For example, in some libc++ implementations the following code
returns 'false':

    std::isnan(std::numeric_limits<float>::quiet_NaN())

The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
operands are never NaNs. This assumption however should not be applied
to the functions that check FP number properties, including 'isnan'. If
such function returns expected result instead of actually making
checks, it becomes useless in many cases. The option '-ffast-math' is
often used for performance critical code, as it can speed up execution
by the expense of manual treatment of corner cases. If 'isnan' returns
assumed result, a user cannot use it in the manual treatment of NaNs
and has to invent replacements, like making the check using integer
operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
which also expresses the opinion, that limitations imposed by
'-ffast-math' should be applied only to 'math' functions but not to
'tests'.

To overcome these drawbacks, this change introduces a new IR intrinsic
function 'llvm.isnan', which realizes the check as specified by IEEE-754
and C standards in target-agnostic way. During IR transformations it
does not undergo undesirable optimizations. It reaches instruction
selection, where is lowered in target-dependent way. The lowering can
vary depending on options like '-ffast-math' or '-ffp-model' so the
resulting code satisfies requested semantics.

Differential Revision: https://reviews.llvm.org/D104854
2021-08-04 15:27:49 +07:00
Sjoerd Meijer 30fbb06979 [FuncSpec] Support specialising recursive functions
This adds support for specialising recursive functions. For example:

    int Global = 1;
    void recursiveFunc(int *arg) {
      if (*arg < 4) {
        print(*arg);
        recursiveFunc(*arg + 1);
      }
    }
    void main() {
      recursiveFunc(&Global);
    }

After 3 iterations of function specialisation, followed by inlining of the
specialised versions of recursiveFunc, the main function looks like this:

    void main() {
      print(1);
      print(2);
      print(3);
    }

To support this, the following has been added:
- Update the solver and state of the new specialised functions,
- An optimisation to propagate constant stack values after each iteration of
  function specialisation, which is necessary for the next iteration to
  recognise the constant values and trigger.

Specialising recursive functions is (at the moment) controlled by option
-func-specialization-max-iters and is opt-in for compile-time reasons. I.e.,
the default is -func-specialization-max-iters=1, but for the example above we
would need to use -func-specialization-max-iters=3. Future work is to see if we
can increase the default, or improve the cost-model/heuristics to control
compile-times.

Differential Revision: https://reviews.llvm.org/D106426
2021-08-04 08:07:04 +01:00
Senran Zhang 486b6013f9 [Support] Initialize common options in `getRegisteredOptions`
This allows users accessing options in libSupport before invoking
`cl::ParseCommandLineOptions`, and also matches the behavior before
D105959.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D106334
2021-08-03 23:59:10 -07:00
hsmahesha 596e61c332 [AMDGPU] Ignore call graph node which does not have function info.
While collecting reachable callees (from kernels), ignore call graph node which
does not have associated function or associated function is not a definition.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D107329
2021-08-04 10:22:33 +05:30
Senran Zhang df4e0beaeb [NFC][ConstantFold] Check getAggregateElement before getSplatValue call
Constant::getSplatValue has O(N) time complexity in the worst case,
where N is the # of elements in a vector. So we call
Constant::getAggregateElement first and return earlier if possible to
avoid unnecessary getSplatValue calls.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D107252
2021-08-03 21:52:14 -07:00
Heejin Ahn 9bd02c433b [WebAssembly] Misc. cosmetic changes in EH (NFC)
- Rename `wasm.catch` intrinsic to `wasm.catch.exn`, because we are
  planning to add a separate `wasm.catch.longjmp` intrinsic which
  returns two values.
- Rename several variables
- Remove an unnecessary parameter from `canLongjmp` and `isEmAsmCall`
  from LowerEmscriptenEHSjLj pass
- Add `-verify-machineinstrs` in a test for a safety measure
- Add more comments + fix some errors in comments
- Replace `std::vector` with `SmallVector` for cases likely with small
  number of elements
- Renamed `EnableEH`/`EnableSjLj` to `EnableEmEH`/`EnableEmSjLj`: We are
  soon going to add `EnableWasmSjLj`, so this makes the distincion
  clearer

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D107405
2021-08-03 21:03:46 -07:00
Arthur Eubanks ad25344620 [MC][CodeGen] Emit constant pools earlier
Previously we would emit constant pool entries for ldr inline asm at the
very end of AsmPrinter::doFinalization(). However, if we're emitting
dwarf aranges, that would end all sections with aranges. Then if we have
constant pool entries to be emitted in those same sections, we'd hit an
assert that the section has already been ended.

We want to emit constant pool entries before emitting dwarf aranges.
This patch splits out arm32/64's constant pool entry emission into its
own MCTargetStreamer virtual method.

Fixes PR51208

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D107314
2021-08-03 20:55:31 -07:00
Jacob Hegna b16c37fa2c [MLGO] Update the current model url for the Oz inliner model. 2021-08-04 03:09:00 +00:00
Jessica Paquette 5643736378 [AArch64][GlobalISel] Widen G_SELECT before clamping it
This allows us to handle the s88 G_SELECTS:

https://godbolt.org/z/5s18M4erY

Weird types like this can result in weird merges.

Widening to s128 first and then clamping down avoids that situation.

Differential Revision: https://reviews.llvm.org/D107415
2021-08-03 18:31:17 -07:00
Shimin Cui 2d9759c790 [GlobalOpt] Fix the load types when OptimizeGlobalAddressOfMalloc
Currently, in OptimizeGlobalAddressOfMalloc, the transformation for global loads assumes that they have the same Type. With the support of ConstantExpr (https://reviews.llvm.org/D106589), this may not be true any more (as seen in the test case), and we miss the code to handle this, This is to fix that.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D107397
2021-08-03 19:22:53 -04:00
Craig Topper b818da27ab [SimplifyCFG] Enable switch to lookup table for more types.
This transform has been restricted to legal types since
https://reviews.llvm.org/rG65df808f6254617b9eee931d00e95d900610b660
in 2012.

This is particularly restrictive on RISCV64 which only has i64
as a legal integer type. i32 is a very common type in code
generated from C, but we won't form a lookup table with it.
This also effects other common types like i8/i16 types on ARM,
AArch64, RISCV, etc.

This patch proposes to allow power of 2 types larger than 8 bit, if
they will fit in the largest legal integer type in DataLayout.
These types are common in C code so generally well handled in
the backends.

We could probably do this for other types like i24 and rely on
alignment and padding to allow the backend to use a single wider
load. This isn't my main concern right now and it will need more
tests.

We could also allow larger types up to some limit and let the
backend split into multiple loads, but we need to define that
limit. It's also not my main concern right now.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D107233
2021-08-03 15:35:16 -07:00
Evandro Menezes 63a5ac4e0d [RISCV] Add scheduling resources for V
Add the scheduling resources for the V extension instructions.

Differential Revision: https://reviews.llvm.org/D98002
2021-08-03 15:47:51 -05:00
modimo f5b8a3125a [ThinLTO] Add TimeTrace for Thinlink step
Results from Clang self-build:

{F17435948}

Testing:
ninja check-all

Reviewed By: anton-afanasyev

Differential Revision: https://reviews.llvm.org/D104428
2021-08-03 13:20:04 -07:00
Alexey Bataev 871ea69803 [SLP]Do not emit extra shuffle for insertelements vectorization.
If the vectorized insertelements instructions form indentity subvector
(the subvector at the beginning of the long vector), it is just enough
to extend the vector itself, no need to generate inserting subvector
shuffle.

Differential Revision: https://reviews.llvm.org/D107344
2021-08-03 13:18:41 -07:00
Alexey Bataev 7d9d926a18 Revert "[SLP]Improve graph reordering."
This reverts commit e408d1dfab and
2 other (4b25c11321 and
c2deb2afaf) related to fix the problem with the
reordering shuffles.
2021-08-03 12:13:43 -07:00
Sami Tolvanen 7ce1c4da77 ThinLTO: Fix inline assembly references to static functions with CFI
Create an internal alias with the original name for static functions
that are renamed in promoteInternals to avoid breaking inline
assembly references to them.

Relands 700d07f8ce with -msvc targets
fixed.

Link: https://github.com/ClangBuiltLinux/linux/issues/1354

Reviewed By: nickdesaulniers, pcc

Differential Revision: https://reviews.llvm.org/D104058
2021-08-03 11:35:30 -07:00
Dylan Fleming 3943a74666 [InstCombine] Fixed select + masked load fold failure
Fixed type assertion failure caused by trying to fold a masked load with a
select where the select condition is a scalar value

Reviewed By: sdesmalen, lebedev.ri

Differential Revision: https://reviews.llvm.org/D107372
2021-08-03 19:06:12 +01:00
Philip Reames 223835f08b [runtimeunroll] A bit of style cleanup to simplify a following change [NFC]
Use for-range, use the idiomatic pattern for non-loop values, etc..
2021-08-03 10:28:46 -07:00
David Green bd07c2e266 [AArch64] Prefer fmov over orr v.16b when copying f32/f64
This changes the lowering of f32 and f64 COPY from a 128bit vector ORR to
a fmov of the appropriate type. At least on some CPU's with 64bit NEON
data paths this is expected to be faster, and shouldn't be slower on any
CPU that treats fmov as a register rename.

Differential Revision: https://reviews.llvm.org/D106365
2021-08-03 17:25:40 +01:00
Craig Topper deaeb16d88 [RISCV] Indicate that RISCVMergeBaseOffsetOpt preserves the CFG.
Return false from runOnFunction if nothing changed. Curiously
we already returned a bool from detectAndFoldOffset, but didn't
use it.

Fix a couple breaks after returns that I saw while auditing
detectAndFoldOffset.

Differential Revision: https://reviews.llvm.org/D107303
2021-08-03 08:32:36 -07:00
Simon Pilgrim 11396641e4 [DAG] Cleanup DAGCombiner::CombineConsecutiveLoads early-outs. NFCI.
We had some similar hasOneUse/isNON_EXTLoad early-outs spread out over different parts of the method - we should pull them all together.

Noticed while triaging PR45116
2021-08-03 13:47:55 +01:00
Krishna 946fd4ea65 Revert "[InstCombine] Remove nnan requirement for transformation to fabs from select"
This reverts commit 6180ce2e2a.
2021-08-03 18:08:11 +05:30
Krishna d99260641b [InstCombine] Fold phi ( inttoptr/ptrtoint x ) to phi (x)
The inttoptr/ptrtoint roundtrip optimization is not always correct.
We are working towards removing this optimization and adding support to specific cases where this optimization works.

In this patch, we focus on phi-node operands with inttoptr casts.
We know that ptrtoint( inttoptr( ptrtoint x) ) is same as ptrtoint (x).
So, we want to remove this roundtrip cast which goes through phi-node.

Reviewed By: aqjune

Differential Revision: https://reviews.llvm.org/D106289
2021-08-03 17:52:59 +05:30
Krishna 6180ce2e2a [InstCombine] Remove nnan requirement for transformation to fabs from select
In this patch, the "nnan" requirement is removed for the canonicalization of select with fcmp to fabs.
(i) FSub logic: Remove check for nnan flag presence in fsub. Example: https://alive2.llvm.org/ce/z/751svg (fsub).
(ii) FNeg logic: Remove check for the presence of nnan and nsz flag in fneg. Example: https://alive2.llvm.org/ce/z/a_fsdp (fneg).

Differential Revision: https://reviews.llvm.org/D106872
2021-08-03 17:52:58 +05:30
Simon Pilgrim d3917bbfc6 [X86] Add title comment to separate the "CPU Families" features from the other subtarget features. NFCI.
Hopefully we can get rid of these some day...
2021-08-03 12:53:57 +01:00
Fraser Cormack cba6aab971 [RISCV] Support simple fractional steps in matching VID sequences
This patch extends the optimization of VID-sequence BUILD_VECTORs
introduced in D104921 to include simple fractional steps composed of a
separated integer numerator and denominator.

A notable limitation in this sequence detection is that only sequences
with steps N/1 or 1/D are found, meaning that the step between elements
and the frequency with which it changes is consistent across the whole
sequence. Fractional steps such as 2/3 won't be matched as those would
involve more complex tracking of state or some level of backtracking.

As is stands, however, this patch is sufficient to match common
interleave-type shuffle indices, for example matching `<0,0,1,1>` (or
commonly `<0,u,1,u>` or `<u,0,u,1>`) to an index sequence divided by 2.

While the optimization is relatively `undef`-tolerant, due to greedy
pattern-matching there even are some simple patterns which confuse the
sequence detection into identifying either a suboptimal sequence or no
sequence at all.

Currently only fractional-step sequences identified as having a
power-of-two denominator are actually lowered to RVV instructions. This
is to avoid introducing divisions into the generated code.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D106533
2021-08-03 10:38:24 +01:00
Jason Molenda 0d8cd4e2d5 [AArch64InstPrinter] Change printAddSubImm to comment imm value when shifted
Add a comment when there is a shifted value,
    add x9, x0, #291, lsl #12 ; =1191936
but not when the immediate value is unshifted,
    subs x9, x0, #256 ; =256
when the comment adds nothing additional to the reader.

Differential Revision: https://reviews.llvm.org/D107196
2021-08-03 02:28:46 -07:00
Esme-Yi 69396896fb [llvm-readobj][XCOFF] Fix the error dumping for the first
item of StringTable.

Summary: For the string table in XCOFF, the first 4 bytes
contains the length of the string table, so we should
print the string entries from fifth bytes. This patch
also adds tests for llvm-readobj dumping the string
table.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D105522
2021-08-03 09:08:58 +00:00
David Sherwood 0156f91f3b [NFC] Rename enable-strict-reductions to force-ordered-reductions
I'm renaming the flag because a future patch will add a new
enableOrderedReductions() TTI interface and so the meaning of this
flag will change to be one of forcing the target to enable/disable
them. Also, since other places in LoopVectorize.cpp use the word
'Ordered' instead of 'strict' I changed the flag to match.

Differential Revision: https://reviews.llvm.org/D107264
2021-08-03 09:33:01 +01:00
Cullen Rhodes a02bbeeae7 [AArch64][AsmParser] NFC: Use helpers in matrix tile list parsing 2021-08-03 08:13:01 +00:00
Jay Foad 40202b13b2 [AMDGPU] Legalize operands of V_ADDC_U32_e32 and friends
These instructions have an implicit use of vcc which counts towards the
constant bus limit. Pre gfx10 this means that the explicit operands
cannot be sgprs. Use the custom inserter hook to call legalizeOperands
to enforce that restriction.

Fixes https://bugs.llvm.org/show_bug.cgi?id=51217

Differential Revision: https://reviews.llvm.org/D106868
2021-08-03 09:04:52 +01:00
Paulo Matos d3a0a65bf0 Reland: "[WebAssembly] Add new pass to lower int/ptr conversions of reftypes"
Add new pass LowerRefTypesIntPtrConv to generate debugtrap
instruction for an inttoptr and ptrtoint of a reference type instead
of erroring, since calling these instructions on non-integral pointers
has been since allowed (see ac81cb7e6).

Differential Revision: https://reviews.llvm.org/D107102
2021-08-03 09:20:51 +02:00
jacquesguan 7900ee0b61 [RISCV] Teach VSETVLI insertion to merge the unused VSETVLI with the one need to be insert after it.
If a vsetvli instruction is not compatible with the next vector instruction,
and there is no other things that may update or use VL/VTYPE, we could merge
it with the next vsetvli instruction that should be insert for the vector
instruction.

This commit only merge VTYPE with the former vsetvli instruction which has
the same VL.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D106857
2021-08-03 12:06:59 +08:00
luxufan 0023caf952 [RuntimeDyldChecker] Delete comparision of integers of different signs 2021-08-03 11:38:25 +08:00
luxufan f4e418ac1e [RuntimeDyldChecker] Support offset in decode_operand expr
In RISCV's relocations, some relocations are comprised of two relocation types. For example, R_RISCV_PCREL_HI20 and R_RISCV_PCREL_LO12_I compose a PC relative relocation. In general the compiler will set a label in the position of R_RISCV_PCREL_HI20. So, to test the R_RISCV_PCREL_LO12_I relocation, we need decode instruction at position of the label points to R_RISCV_PCREL_HI20 plus 4 (the size of a riscv non-compress instruction).

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D105528
2021-08-03 11:25:51 +08:00
Shimin Cui 7ce98cf56e [GlobalOpt] Fix the assert for stored once non-pointer to global address
This is to fix the assert @bjope reported due to the code change of https://reviews.llvm.org/D106589. The test case from @bjope is also included.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D107302
2021-08-02 19:23:29 -04:00
Eli Friedman 1f62af6346 [AArch64][SelectionDAG] Support passing/returning scalable vectors with unusual types.
This adds handling for two cases:

1. A scalable vector where the element type is promoted.
2. A scalable vector where the element count is odd (or more generally,
   not divisble by the element count of the part type).

(Some element types still don't work; for example, <vscale x 2 x i128>,
or <vscale x 2 x fp128>.)

Differential Revision: https://reviews.llvm.org/D105591
2021-08-02 15:53:16 -07:00
Roman Lebedev 6f6e9a867f
[BasicTTIImpl][LoopUnroll] getUnrollingPreferences(): emit ORE remark when advising against unrolling due to a call in a loop
I'm not sure this is the best way to approach this,
but the situation is rather not very detectable unless we explicitly call it out when refusing to advise to unroll.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D107271
2021-08-03 00:57:26 +03:00
Roman Lebedev 4ba3326f17
[InstCombine] `vector_reduce_{or,and}(?ext(<n x i1>))` --> `?ext(vector_reduce_{or,and}(<n x i1>))` (PR51259)
This allows the expansion logic to actually trigger if the argument
was extended from i1 element type, like the rest of the reductions expect.

Alive2 agrees:
https://alive2.llvm.org/ce/z/wcfews (or zext)
https://alive2.llvm.org/ce/z/FCXNFx (or sext)
https://alive2.llvm.org/ce/z/f26zUY (and zext)
https://alive2.llvm.org/ce/z/jprViN (and sext)
2021-08-03 00:54:35 +03:00
Jessica Paquette bd13c8e610 [AArch64][GlobalISel] Emit extloads for ZExt/SExt values in assignValueToAddress
When a value is expected to be extended, we should emit an extended load rather
than a normal G_LOAD.

Add checklines to arm64-abi.ll which show that we now emit the correct loads.

For ease of comparison: https://godbolt.org/z/8WvY6EfdE

Differential Revision: https://reviews.llvm.org/D107313
2021-08-02 14:48:44 -07:00
Roman Lebedev 554fc9ad0a
[InstCombine] `vector_reduce_smax(?ext(<n x i1>))` --> `?ext(vector_reduce_{and,or}(<n x i1>))` (PR51259)
Alive2 agrees:
https://alive2.llvm.org/ce/z/3oqir9 (self)
https://alive2.llvm.org/ce/z/6cuI5m (zext)
https://alive2.llvm.org/ce/z/4FL8rD (sext)

We already handle `vector_reduce_and(<n x i1>)`,
so let's just combine into the already-handled pattern
and let the existing fold do the rest.
2021-08-03 00:29:06 +03:00
Roman Lebedev f47b7b6d10
[InstCombine] `vector_reduce_smin(?ext(<n x i1>))` --> `?ext(vector_reduce_{or,and}(<n x i1>))` (PR51259)
Alive2 agrees:
https://alive2.llvm.org/ce/z/noXtZ8 (self)
https://alive2.llvm.org/ce/z/JNrN6C (zext)
https://alive2.llvm.org/ce/z/58snuN (sext)

We already handle `vector_reduce_and(<n x i1>)`,
so let's just combine into the already-handled pattern
and let the existing fold do the rest.
2021-08-03 00:29:06 +03:00
Nikita Popov c7770574f9 Revert "[unroll] Move multiple exit costing into consumer pass [NFC]"
This reverts commit 76940577e4.

This causes Transforms/LoopUnroll/ARM/multi-blocks.ll to fail.
2021-08-02 22:23:34 +02:00
Chang-Sun Lin, Jr b58eda39eb [ValueTracking] Fix computeConstantRange to use "may" instead of "always" semantics for llvm.assume
ValueTracking should allow for value ranges that may satisfy
llvm.assume, instead of restricting the ranges only to values that
will always satisfy the condition.

Differential Revision: https://reviews.llvm.org/D107298
2021-08-02 22:20:17 +02:00
Roman Lebedev b9b7162b8b
[InstCombine] `vector_reduce_umax(?ext(<n x i1>))` --> `?ext(vector_reduce_or(<n x i1>))` (PR51259)
Alive2 agrees:
https://alive2.llvm.org/ce/z/NbBaeT (self)
https://alive2.llvm.org/ce/z/iEaig4 (zext)
https://alive2.llvm.org/ce/z/meGb3y (sext)

We already handle `vector_reduce_and(<n x i1>)`,
so let's just combine into the already-handled pattern
and let the existing fold do the rest.
2021-08-02 23:02:23 +03:00
Roman Lebedev 0c13798056
[InstCombine] `vector_reduce_umin(?ext(<n x i1>))` --> `?ext(vector_reduce_and(<n x i1>))` (PR51259)
Alive2 agrees:
https://alive2.llvm.org/ce/z/XxUScW (self)
https://alive2.llvm.org/ce/z/3usTF- (zext)
https://alive2.llvm.org/ce/z/GVxwQz (sext)

We already handle `vector_reduce_and(<n x i1>)`,
so let's just combine into the already-handled pattern
and let the existing fold do the rest.
2021-08-02 23:02:22 +03:00
Philip Reames 76940577e4 [unroll] Move multiple exit costing into consumer pass [NFC]
This aligns the multiple exit costing with all the other cost decisions.  Note that UnrollAndJam, which is the only other caller of the original home of this code, unconditionally bails out of multiple exit loops.
2021-08-02 12:46:23 -07:00
Nikita Popov 380b8a603c [DFAJumpThreading] Use SmallPtrSet for Visited (NFC)
This set is only used for contains checks, so there is no need to
use std::set.
2021-08-02 21:30:25 +02:00
Nikita Popov 3f7aea1a37 [DFAJumpThreading] Use insert return value (NFC)
Rather than find + insert. Also use range based for loop.
2021-08-02 21:21:21 +02:00
Nikita Popov 84602f98c6 [DFAJumpThreading] Remove unnecessary includes (NFC)
This file uses neither unordered_map nor unordered_set.
2021-08-02 21:13:30 +02:00
Nikita Popov e97524cba2 [DFAJumpThreading] Mark DT as preserved in LegacyPM
It is marked as preserved in NewPM, but not LegacyPM.
2021-08-02 21:13:30 +02:00
Roman Lebedev 469793efa7
[InstCombine] `vector_reduce_mul(?ext(<n x i1>))` --> `zext(vector_reduce_and(<n x i1>))` (PR51259)
Alive2 agrees:
https://alive2.llvm.org/ce/z/PDansB (self)
https://alive2.llvm.org/ce/z/55D-Xc (zext)
https://alive2.llvm.org/ce/z/LxG3-r (sext)

We already handle `vector_reduce_and(<n x i1>)`,
so let's just combine into the already-handled pattern
and let the existing fold do the rest.
2021-08-02 21:57:51 +03:00
Philip Reames 9016beaa24 [unrollruntime] Pull out a helper function for readability and eventual reuse [nfc] 2021-08-02 11:47:27 -07:00
Paulo Matos 245f2ee647 Revert "[WebAssembly] Add new pass to lower int/ptr conversions of reftypes"
This reverts commit ce1c59dea6.
2021-08-02 20:12:25 +02:00
Philip Reames ebc4c4e3b0 [unroll] Add clarifying comment
The option to not preserve LCSSA is in fact not tested at all in upstream.  I was tempted to just remove the code entirely, but realized I didn't need to for my actual goal.
2021-08-02 10:44:56 -07:00
Alexander Yermolovich 5a865b0b1e [DWARF] Don't process .debug_info relocations for DWO Context
When we build with split dwarf in single mode the .o files that contain both "normal" debug sections and dwo sections, along with relocaiton sections for "normal" debug sections.
When we create DWARF context in DWARFObjInMemory we process relocations and store them in the map for .debug_info, etc section.
For DWO Context we also do it for non dwo dwarf sections. Which I believe is not necessary. This leads to a lot of memory being wasted. We observed 70GB extra memory being used.

I went with context sensitive approach, flag is passed in. I am not sure if it's always safe not to process relocations for regular debug sections if Obj contains .dwo sections.
If it is alternatvie might be just to scan, in constructor, sections and if there are .dwo sections not to process regular debug ones.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D106624
2021-08-02 10:41:47 -07:00
Paulo Matos ce1c59dea6 [WebAssembly] Add new pass to lower int/ptr conversions of reftypes
Add new pass LowerRefTypesIntPtrConv to generate trap
instruction for an inttoptr and ptrtoint of a reference type instead
of erroring, since calling these instructions on non-integral pointers
has been since allowed (see ac81cb7e6).

Differential Revision: https://reviews.llvm.org/D107102
2021-08-02 19:40:00 +02:00
Florian Hahn bb725c9803
[VPlan] Use defined and ops VPValues to print VPInterleaveRecipe.
This patch updates VPInterleaveRecipe::print to print the actual defined
VPValues for load groups and the store VPValue operands for store
groups.

The IR references may become outdated while transforming the VPlan and
the defined and stored VPValues always are up-to-date.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D107223
2021-08-02 18:36:36 +01:00
Roman Lebedev 1e801439be
[InstCombine] `xor` reduction w/ i1 elt type is a parity check
For i1 element type, `xor` and `add` are interchangeable
(https://alive2.llvm.org/ce/z/e77hhQ), so we should treat it just like
an `add` reduction and consistently transform them both:
https://alive2.llvm.org/ce/z/MjCm5W (self)
https://alive2.llvm.org/ce/z/kgqF4M (skipped zext)
https://alive2.llvm.org/ce/z/pgy3HP (skipped sext)

Though, let's emit the IR that is similar to the one we produce for
`vector_reduce_add(<n x i1>)`.

See https://bugs.llvm.org/show_bug.cgi?id=51259
2021-08-02 20:21:37 +03:00
Thomas Lively 417e500668 [WebAssembly] Compute known bits for SIMD bitmask intrinsics
This optimizes out the mask when the result of a bitmask is interpreted as an i8
or i16 value. Resolves PR50507.

Differential Revision: https://reviews.llvm.org/D107103
2021-08-02 09:52:34 -07:00
David Green c423a586a7 [ARM] Remove setPreservesCFG from ARMBlockPlacement
As of 2829391840 it no longer preserves the CFG, needing to
split blocks in order to add DLS instructions.
2021-08-02 14:15:45 +01:00
Irina Dobrescu b01417d3c5 [AArch64] Optimise min/max lowering in ISel
Differential Revision: https://reviews.llvm.org/D106561
2021-08-02 13:40:21 +01:00
Carl Ritson 675c942373 [AMDGPU] Disable NSA for BVH instructions when appropriate
Check maximum NSA size when selecting NSA or non-NSA BVH instructions.

Differential Revision: https://reviews.llvm.org/D103230
2021-08-02 20:09:26 +09:00
Florian Mayer 66b4aafa2e [hwasan] Detect use after scope within function.
Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D105201
2021-08-02 11:34:12 +01:00
Simon Pilgrim 7397dcb403 [TTI] Add basic SK_InsertSubvector shuffle mask recognition
This patch adds an initial ShuffleVectorInst::isInsertSubvectorMask helper to recognize 2-op shuffles where the lowest elements of one of the sources are being inserted into the "in-place" other operand, this includes "concat_vectors" patterns as can be seen in the Arm shuffle cost changes. This also helped fix a x86 issue with irregular/length-changing SK_InsertSubvector costs - I'm hoping this will help with D107188

This doesn't currently attempt to work with 1-op shuffles that could either be a "widening" shuffle or a self-insertion.

The self-insertion case is tricky, but we currently always match this with the existing SK_PermuteSingleSrc logic.

The widening case will be addressed in a follow up patch that treats the cost as 0.

Masks with a high number of undef elts will still struggle to match optimal subvector widths - its currently bounded by minimum-width possible insertion, whilst some cases would benefit from wider (pow2?) subvectors.

Differential Revision: https://reviews.llvm.org/D107228
2021-08-02 11:23:44 +01:00
Simon Pilgrim 0579050116 Fix MSVC signed/unsigned comparison warning. NFCI. 2021-08-02 11:23:43 +01:00
Rosie Sumpter f117ed542f [LoopFlatten] Fix missed LoopFlatten opportunity
When the limit of the inner loop is a known integer, the InstCombine
pass now causes the transformation e.g. imcp ult i32 %inc, tripcount ->
icmp ult %j, tripcount-step (where %j is the inner loop induction
variable and %inc is add %j, step), which is now accounted for when
identifying the trip count of the loop. This is also an acceptable use
of %j (provided the step is 1) so is ignored as long as the compare
that it's used in is also the condition of the inner branch.

Differential Revision: https://reviews.llvm.org/D105802
2021-08-02 11:09:54 +01:00
David Green 2829391840 [ARM] Revert WLSTP to DLSTP if the target block is out of range
If the block target for a WLSTP instruction is known to be out of range,
and cannot be fixed by the ARMBlockPlacementPass, we can relax it to a
DLSTP (and cmp/branch) to still allow the creation of tail predicated
loops. That is what this patch does, adding extra revert code to the
fallback path of ARMBlockPlacementPass.

Due to the code produced when reverting, this creates a DLSTP between a
Bcc and a Br. As a DLS isn't necessarily a terminator we need to split
the block to move the DLS/Br into.

Differential Revision: https://reviews.llvm.org/D104709
2021-08-02 10:59:52 +01:00
Cullen Rhodes 7ed0120d84 [AArch64][AsmParser] NFC: Parser.Lex() -> Lex()
Reviewed By: tmatheson

Differential Revision: https://reviews.llvm.org/D107146
2021-08-02 09:48:41 +00:00
Max Kazantsev c5b63714b5 [GC][NFC] Make getGCStrategy by name available in IR
We might want to use info from GC strategy in middle end analysis.
The motivation for this is provided in D99135: we may want to ask
a GC if it's going to work with a given pointer (currently this code
makes naive check by the method name).

Differetial Revision: https://reviews.llvm.org/D100559
Reviewed By: reames
2021-08-02 14:26:04 +07:00
Carl Ritson a441de6d94 [AMDGPU][GlobalISel] Add missing default mapping for BVH intrinsics
Application of default mapping to BVH intrinsics was missing.
Copy parts of SelectionDAG test to GlobalISel test as these would
have indicated this error.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D107211
2021-08-02 12:43:38 +09:00
Freddy Ye d268c20070 [X86] Support auto-detect for tigerlake and alderlake
Differential Revision: https://reviews.llvm.org/D107245
2021-08-02 11:01:01 +08:00
Shimin Cui 732b05555c [GlobalOpt] support ConstantExpr use of global address for OptimizeGlobalAddressOfMalloc
I'm working on extending the OptimizeGlobalAddressOfMalloc to handle some more general cases. This is to add support of the ConstantExpr use of the global variables. The function allUsesOfLoadedValueWillTrapIfNull is now iterative with the added CE use of GV. Also, the recursive function valueIsOnlyUsedLocallyOrStoredToOneGlobal is changed to iterative using a worklist with the GEP case added.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D106589
2021-07-31 18:42:02 -04:00
Hsiangkai Wang 8b33839f01 [RISCV] Rename vector inline constraint from 'v' to 'vr' and 'vm' in IR.
Differential Revision: https://reviews.llvm.org/D107139
2021-08-01 05:58:17 +08:00
Eli Friedman bdd55b2f18 Fix the default alignment of i1 vectors.
Currently, the default alignment is much larger than the actual size of
the vector in memory.  Fix this to use a sane default.

For SVE, temporarily remove lowering of load/store operations for
predicates with less than 16 elements. The layout the backend was
assuming for SVE predicates with less than 16 elements doesn't agree
with the frontend. More work probably needs to be done here.

This change is, strictly speaking, not backwards-compatible at the
bitcode level. But probably nobody is actually depending on that; i1
vectors in memory are rare, and the code that does use them probably
ends up forcing the alignment to something sane anyway.  If we think
this is a concern, I can restrict this to scalable vectors for now
(where it's actually causing issues for me at the moment).

Differential Revision: https://reviews.llvm.org/D88994
2021-07-31 14:09:59 -07:00
Eli Friedman 2a2847823f [ConstantFold] Get rid of special cases for sizeof etc.
Target-dependent constant folding will fold these down to simple
constants (or at least, expressions that don't involve a GEP).  We don't
need heroics to try to optimize the form of the expression before that
happens.

Fixes https://bugs.llvm.org/show_bug.cgi?id=51232 .

Differential Revision: https://reviews.llvm.org/D107116
2021-07-31 13:20:47 -07:00
Sanjay Patel 7f55557765 [Analysis] improve function signature checking for snprintf
The check for size_t parameter 1 was already here for snprintf_chk,
but it wasn't applied to regular snprintf. This could lead to
mismatching and eventually crashing as shown in:
https://llvm.org/PR50885
2021-07-31 15:17:20 -04:00
Craig Topper 593059b328 [RISCV] Rename RISCVISD::FCVT_W_RV64 to FCVT_W_RTZ_RV64. NFC
fcvt.w(u) supports multiple rounding modes, but the ISD node
doesn't encode that. So name it to match the rounding mode it uses.
2021-07-31 11:14:59 -07:00
Sanjay Patel f2a322bfcf [SROA] prevent crash on large memset length (PR50910)
I don't know much about this pass, but we need a stronger
check on the memset length arg to avoid an assert. The
current code was added with D59000.
The test is reduced from:
https://llvm.org/PR50910

Differential Revision: https://reviews.llvm.org/D106462
2021-07-31 14:07:30 -04:00
Sanjay Patel a22c99c3c1 [InstCombine] canonicalize cmp-of-bitcast-of-vector-cmp to use zero constant
We can invert a compare constant and preserve the logic
as shown in this sampling:
https://alive2.llvm.org/ce/z/YAXbfs
(In theory, we could deal with non-all-ones/zero as well,
but it doesn't seem worthwhile.)

I noticed this as a part of the x86 codegen difference in
https://llvm.org/PR51259 - it ends up using "test"
instead of "not + cmp" in that example.

This pattern also shows up in https://llvm.org/PR41312
and https://llvm.org/PR50798 .

Differential Revision: https://reviews.llvm.org/D107170
2021-07-31 13:31:12 -04:00
David Green 15a1d7e839 [ARM] Switch order of creating VADDV and VMLAV.
It can be beneficial to attempt to try the larger VMLAV patterns before
VADDV, in case both may match the same code.
2021-07-31 16:28:52 +01:00
Matt Arsenault ebc17a0d68 GlobalISel: Scalarize unaligned vector stores
This has the same problems and limitations as the load path.
2021-07-31 10:37:15 -04:00
Simon Pilgrim 3a7c82efb8 [DAG] isGuaranteedNotToBeUndefOrPoison - handle ISD::BUILD_VECTOR nodes
If all demanded elements of the BUILD_VECTOR pass a isGuaranteedNotToBeUndefOrPoison check, then we can treat this specific demanded use of the BUILD_VECTOR as guaranteed not to be undef or poison either.

Differential Revision: https://reviews.llvm.org/D107174
2021-07-31 15:08:25 +01:00
Matt Arsenault bc2cb91a20 GlobalISel: Have lowerStore handle some unaligned stores
This is NFC until some of the AMDGPU legalization rules are ripped
out.
2021-07-31 10:01:42 -04:00
Alexandros Lamprineas 7d940432c4 [AArch64] Legalize MVT::i64x8 in DAG isel lowering
This patch legalizes the Machine Value Type introduced in D94096 for loads
and stores. A new target hook named getAsmOperandValueType() is added which
maps i512 to MVT::i64x8. GlobalISel falls back to DAG for legalization.

Differential Revision: https://reviews.llvm.org/D94097
2021-07-31 09:51:28 +01:00
Alexandros Lamprineas 3094e5389b [AArch64] Add a Machine Value Type for 8 consecutive registers
Adds MVT::i64x8, a Machine Value Type needed for lowering inline assembly
operands which materialize a sequence of eight general purpose registers.

Differential Revision: https://reviews.llvm.org/D94096
2021-07-31 09:51:28 +01:00
Petr Hosek 83302c8489 [profile] Fix profile merging with binary IDs
This fixes support for merging profiles which broke as a consequence
of e50a38840d. The issue was missing
adjustment in merge logic to account for the binary IDs which are
now included in the raw profile just after header.

In addition, this change also:
* Includes the version in module signature that's used for merging
to avoid accidental attempts to merge incompatible profiles.
* Moves the binary IDs size field after version field in the header
as was suggested in the review.

Differential Revision: https://reviews.llvm.org/D107143
2021-07-30 18:54:27 -07:00
Petr Hosek d3dd07e3d0 Revert "[profile] Fix profile merging with binary IDs"
This reverts commit dcadd64986.
2021-07-30 18:53:48 -07:00
Petr Hosek dcadd64986 [profile] Fix profile merging with binary IDs
This fixes support for merging profiles which broke as a consequence
of e50a38840d. The issue was missing
adjustment in merge logic to account for the binary IDs which are
now included in the raw profile just after header.

In addition, this change also:
* Includes the version in module signature that's used for merging
to avoid accidental attempts to merge incompatible profiles.
* Moves the binary IDs size field after version field in the header
as was suggested in the review.

Differential Revision: https://reviews.llvm.org/D107143
2021-07-30 17:38:53 -07:00
Petr Hosek 6ea2f31f3d Revert "[profile] Fix profile merging with binary IDs"
This reverts commit 89d6eb6f8c, this
seemed to have break a few builders.
2021-07-30 14:32:52 -07:00
Florian Mayer b5b023638a Revert "[hwasan] Detect use after scope within function."
This reverts commit 84705ed913.
2021-07-30 22:32:04 +01:00
Brendon Cahoon c4c379d633 [LoopStrengthReduction] Fix pointer extend asserts
Additional asserts were added to ScalarEvolution to enforce
pointer/int type rules. An assert is triggered when the LSR pass
attempts to extend a pointer SCEV in GenerateTruncates.

This patch changes GenerateTruncates to exit early if the Formaula
contains a ScaledReg or BaseReg with a pointer type.

Differential Revision: https://reviews.llvm.org/D107185
2021-07-30 17:24:08 -04:00
Petr Hosek 89d6eb6f8c [profile] Fix profile merging with binary IDs
This fixes support for merging profiles which broke as a consequence
of e50a38840d. The issue was missing
adjustment in merge logic to account for the binary IDs which are
now included in the raw profile just after header.

In addition, this change also:
* Includes the version in module signature that's used for merging
to avoid accidental attempts to merge incompatible profiles.
* Moves the binary IDs size field after version field in the header
as was suggested in the review.

Differential Revision: https://reviews.llvm.org/D107143
2021-07-30 13:30:30 -07:00
Rahman Lavaee 2256b359d7 Explain the symbols of basic block clusters with an example in the header comments.
This prevents from confusion with the ``labels`` option.

Reviewed By: snehasish

Differential Revision: https://reviews.llvm.org/D107128
2021-07-30 12:08:04 -07:00
Fangrui Song a1532ed275 [InstrProfiling] Make CountersPtr in __profd_ relative
Change `CountersPtr` in `__profd_` to a label difference, which is a link-time
constant. On ELF, when linking a shared object, this requires that `__profc_` is
either private or linkonce/linkonce_odr hidden. On COFF, we need D104564 so that
`.quad a-b` (64-bit label difference) can lower to a 32-bit PC-relative relocation.

```
# ELF: R_X86_64_PC64 (PC-relative)
.quad .L__profc_foo-.L__profd_foo

# Mach-O: a pair of 8-byte X86_64_RELOC_UNSIGNED and X86_64_RELOC_SUBTRACTOR
.quad l___profc_foo-l___profd_foo

# COFF: we actually use IMAGE_REL_AMD64_REL32/IMAGE_REL_ARM64_REL32 so
# the high 32-bit value is zero even if .L__profc_foo < .L__profd_foo
# As compensation, we truncate CountersDelta in the header so that
# __llvm_profile_merge_from_buffer and llvm-profdata reader keep working.
.quad .L__profc_foo-.L__profd_foo
```

(Note: link.exe sorts `.lprfc` before `.lprfd` even if the object writer
has `.lprfd` before `.lprfc`, so we cannot work around by reordering
`.lprfc` and `.lprfd`.)

With this change, a stage 2 (`-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR`)
`ld -pie` linked clang is 1.74% smaller due to fewer R_X86_64_RELATIVE relocations.
```
% readelf -r pie | awk '$3~/R.*/{s[$3]++} END {for (k in s) print k, s[k]}'
R_X86_64_JUMP_SLO 331
R_X86_64_TPOFF64 2
R_X86_64_RELATIVE 476059  # was: 607712
R_X86_64_64 2616
R_X86_64_GLOB_DAT 31
```

The absolute function address (used by llvm-profdata to collect indirect call
targets) can be converted to relative as well, but is not done in this patch.

Differential Revision: https://reviews.llvm.org/D104556
2021-07-30 11:52:18 -07:00
David Green 69cdadddec [ARM] Distribute reductions based on ascending load offset
This distributes reductions based on the relative offset of loads, if
one is found from their operands. Given chains of reductions this will
then sort them in ascending load order, which in turn can help simple
prefetches latch on to increasing strides more easily.

Differential Revision: https://reviews.llvm.org/D106569
2021-07-30 19:50:07 +01:00
Simon Pilgrim afc6b09dee [InstCombine] getMaskedTypeForICmpPair - remove dead code. NFCI.
Ok should be true at this point, so the early-out is dead - replace with an assert.
2021-07-30 19:23:05 +01:00
Simon Pilgrim 3c0b596ecc SelectionDAGDumper.cpp - remove nested if-else return chain. NFCI.
Match style and don't use an else after a return.
2021-07-30 19:23:05 +01:00
Simon Pilgrim 986841cca2 SelectionDAGDumper.cpp - printrWithDepthHelper - remove dead code. NFCI.
Fixes coverity warning - we have an early-out for unsigned depth == 0, so the depth < 1 early-out later on is dead code.
2021-07-30 19:23:04 +01:00
Matt Arsenault e46badd4e9 GlobalISel: Have lowerLoad scalarize unaligned vectors
This could be smarter by picking an ideal type, or at least splitting
the vector in half first. Also handles lower for non-power-of-2,
non-extending vector loads.

Currently this just avoids failing to legalize some odd vector AMDGPU
tests, but is a step towards removing the split logic from the
NarrowScalar logic.
2021-07-30 13:23:29 -04:00
Alexey Bataev 95e5d401ae [SLP]Improve splats vectorization.
Replace insertelement instructions for splats with just single
insertelement + broadcast shuffle. Also, try to merge these instructions
if they come from the same/shuffled gather node.

Differential Revision: https://reviews.llvm.org/D107104
2021-07-30 10:17:45 -07:00
Kerry McLaughlin 9d35594993 Reland "[LV] Use lookThroughAnd with logical reductions"
If a reduction Phi has a single user which `AND`s the Phi with a type mask,
`lookThroughAnd` will return the user of the Phi and the narrower type represented
by the mask. Currently this is only used for arithmetic reductions, whereas loops
containing logical reductions will create a reduction intrinsic using the widened
type, for example:

  for.body:
    %phi = phi i32 [ %and, %for.body ], [ 255, %entry ]
    %mask = and i32 %phi, 255
    %gep = getelementptr inbounds i8, i8* %ptr, i32 %iv
    %load = load i8, i8* %gep
    %ext = zext i8 %load to i32
    %and = and i32 %mask, %ext
    ...

^ this will generate an and reduction intrinsic such as the following:
    call i32 @llvm.vector.reduce.and.v8i32(<8 x i32>...)

The same example for an add instruction would create an intrinsic of type i8:
    call i8 @llvm.vector.reduce.add.v8i8(<8 x i8>...)

This patch changes AddReductionVar to call lookThroughAnd for other integer
reductions, allowing loops similar to the example above with reductions such
as and, or & xor to vectorize.

Reviewed By: david-arm, dmgreen

Differential Revision: https://reviews.llvm.org/D105632
2021-07-30 18:04:09 +01:00