Commit Graph

149978 Commits

Author SHA1 Message Date
Wang, Pengfei 6f7f5b54c8 [X86] AVX512FP16 instructions enabling 1/6
1. Enable FP16 type support and basic declarations used by following patches.
2. Enable new instructions VMOVW and VMOVSH.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105263
2021-08-10 12:46:01 +08:00
Usman Nadeem 5420fc4a27 [AArch64][SVE][InstCombine] Unpack of a splat vector -> Scalar extend
Replace vector unpack operation with a scalar extend operation.
  unpack(splat(X)) --> splat(extend(X))

If we have both, unpkhi and unpklo, for the same vector then we may
save a register in some cases, e.g:
  Hi = unpkhi (splat(X))
  Lo = unpklo(splat(X))
   --> Hi = Lo = splat(extend(X))

Differential Revision: https://reviews.llvm.org/D106929

Change-Id: I77c5c201131e3a50de1cdccbdcf84420f5b2244b
2021-08-09 14:58:54 -07:00
Usman Nadeem 85bbc05154 [AArch64][SVE][InstCombine] Move last{a,b} before binop if one operand is a splat value
Move the last{a,b} operation to the vector operand of the binary instruction if
the binop's operand is a splat value. This essentially converts the binop
to a scalar operation.

Example:
    // If x and/or y is a splat value:
    lastX (binop (x, y)) --> binop(lastX(x), lastX(y))

Differential Revision: https://reviews.llvm.org/D106932

Change-Id: I93ff5302f9a7972405ee0d3854cf115f072e99c0
2021-08-09 14:48:41 -07:00
Eli Friedman ac20e56911 [AArch64] Implement FCOPYSIGN for SVE.
I was originally going to try to implement this in target-independent
code, but it's actually sort of tricky to generate the correct sequence
for vectors like nxv2f32.  So just stick this in target-specific code,
at least for now.

Differential Revision: https://reviews.llvm.org/D107608
2021-08-09 12:06:48 -07:00
Arnold Schwaighofer b987c283ae [coro] Correct CurrentBlock tracking bug recently introduced
We use the CurrentBlock to determine whether we have already processed a
block. Don't reuse this variable for setting where we should insert the
rematerialization. The rematerialization block is different to the
current block when we rematerialize for coro suspend block users.

Differential Revision: https://reviews.llvm.org/D107573
2021-08-09 10:41:41 -07:00
Bradley Smith 73ecb9987b [AArch64][SVE] Fix assertion failure when lowering fixed length gather/scatter
The patterns for fixed length gather/scatter with 32-bit offsets and
64-bit memory type are slightly different that the rest of the patterns,
as such the lowering needs to be slightly different to ensure the
correct types are used.

Differential Revision: https://reviews.llvm.org/D107576
2021-08-09 14:05:22 +00:00
Jeremy Morse d4ce9e463d [DWARF] Revert sharing subprograms across CUs
This patch is a revert of e08f205f5c. In that patch, DW_TAG_subprograms
were permitted to be referenced across CU boundaries, to improve stack
trace construction using call site information. Unfortunately, as
documented in PR48790, the way that subprograms are "owned" by dwarf units
is sufficiently complicated that subprograms end up in unexpected units,
invalidating cross-unit references.

There's no obvious way to easily fix this, and several attempts have
failed. Revert this to ensure correct DWARF is always emitted.

Three tests change in addition to the reversion, but they're all very
light alterations.

Differential Revision: https://reviews.llvm.org/D107076
2021-08-09 12:43:43 +01:00
Fraser Cormack 2b4a1d4b86 [RISCV] Improve codegen for shuffles with LHS/RHS splats
Shuffles which are broken into separate halves reveal splats in which
a half is accessed via one index; such operations can be optimized to
use "vrgather.vi".

This optimization could be achieved by adding extra patterns to match
`vrgather_vv_vl` which uses a splat as an index operand, but this patch
instead identifies splat earlier. This way, future optimizations can
build on top of the data gathered here, e.g., to splat-gather dominant
indices and insert any leftovers.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D107449
2021-08-09 10:31:40 +01:00
Luo, Yuanke 53642d5b80 [NFC] Fix the formula for reciprocal calculation.
Differential Revision: https://reviews.llvm.org/D107713
2021-08-09 16:03:56 +08:00
Min-Yih Hsu 7cbcde4aa3 [M68k] Use separate asm operand class for different widths of address
This could help asm parser to pick the correct variant of instruction.
This patch also migrated all the control instructions MC tests.
2021-08-09 00:07:19 -07:00
Min-Yih Hsu cf277f0b31 [M68k][NFC] Coalesce render methods in different asm register op class
And assign RegClass (i.e. operand class for all GPR) as the super class
of ARegClass and DRegClass. Note that this is a NFC change because
actually we already had XRDReg to model either address or data register
operands (as well as test coverage for it). The new super class syntax
added here is just making the relations between three RegClass-es more
explicit.
2021-08-09 00:07:19 -07:00
Cullen Rhodes 1a18bb9270 [AArch64] NFC: Remove DecodeVectorRegisterClass from disassembler
The decoder function and table are the same as FPR128, use that instead.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D107644
2021-08-09 06:52:47 +00:00
Simon Atanasyan 990e8025b5 [MC][ELF] Do not error on parsing .debug_* section directive for MIPS
MIPS .debug_* sections should have SHT_MIPS_DWARF section type to
distinguish among sections contain DWARF and ECOFF debug formats, but in
assembly files these sections have SHT_PROGBITS (@progbits) type. Now
assembler shows 'changed section type for ...' error when parsing
`.section .debug_*,"",@progbits` directive for MIPS targets.

The same problem exists for x86-64 target and this patch extends
workaround implemented in D76151. The patch adds one more case
when assembler ignores section types mismatch after `SwitchSection()`
call.

Differential Revision: https://reviews.llvm.org/D107707
2021-08-09 08:54:56 +03:00
Christudasan Devadasan fcf2d5f402 Revert "SROA: Enhance speculateSelectInstLoads"
This reverts commit ffc3fb665d.
2021-08-09 01:13:39 -04:00
Michael Liao b5e470aa2e [LowerMemIntrinsics] Typo fix. 2021-08-08 22:38:58 -04:00
Craig Topper 2f3b738960 [RISCV] Add optimizations for FMV_X_ANYEXTH similar to FMV_X_ANYEXTW_RV64.
This enables the fneg and fabs combines we have for FMV_X_ANYEXTW_RV64.
2021-08-08 18:30:48 -07:00
Craig Topper 88bc29f5f2 [RISCV] Introduce a RISCV CondCode enum instead of using ISD:SET* in MIR. NFC
Previously we converted ISD condition codes to integers and stored
them directly in our MIR instructions. The ISD enum kind of belongs
to SelectionDAG so that seems like incorrect layering.

This patch instead uses a CondCode node on RISCV::SELECT_CC until
isel and then converts it from ISD encoding to a RISCV specific value.
This value can be converted to/from the RISCV branch opcodes in the
RISCV namespace.

My larger motivation is to possibly support a microarchitectural
feature of some CPUs where a short forward branch over a single
instruction can be predicated internally. This will require a new
pseudo instruction for select that needs to carry a branch condition
and live probably until RISCVExpandPseudos. At that point it can be
expanded to control flow without other instructions ending up in the
predicated basic block. Using an ISD encoding in RISCVExpandPseudos
doesn't seem like correct layering.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D107400
2021-08-08 17:25:37 -07:00
Craig Topper 20dfe051ab [RISCV] Move the $rs operand of PseudoStore from outs to ins. NFC
This is the data to be stored so it should be an input.

To keep operand order similar between loads and stores, move the temp
register to the first dest operand of floating point loads. Rework
the assembler code accordingly.

This doesn't have any functional effect because this Pseudo is only
used by the assembler which doesn't use ins/outs.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D107309
2021-08-08 15:58:24 -07:00
Kazu Hirata d9c9d13365 [DWARF] Remove collectChildrenAddressRanges (NFC)
The last use was removed on Dec 21, 2018 in commit
c3f30a7fc6.
2021-08-08 08:57:32 -07:00
Dorit Nuzman 67278b8a90 [LV] Support Interleaved Store Group With Gaps
Teach LV to use masked-store to support interleave-store-group with
gaps (instead of scatters/scalarization).

The symmetric case of using masked-load to support
interleaved-load-group with gaps was introduced a while ago, by
https://reviews.llvm.org/D53668; This patch completes the store-scenario
leftover from D53668, and solves PR50566.

Reviewed by: Ayal Zaks

Differential Revision: https://reviews.llvm.org/D104750
2021-08-08 10:32:02 +03:00
Min-Yih Hsu 657bb7262d [M68k] Separate ADDA from ADD and migrate rest of the arithmetic MC tests
Previously ADD & ADDA (as well as SUB & SUBA) instructions are mixed
together, which not only violated Motorola assembly's syntax but also
made asm parsing more difficult. This patch separates these two kinds of
instructions migrate rest of the tests from
test/CodeGen/M68k/Encoding/Arithmetic to test/MC/M68k/Arithmetic.

Note that we observed minor regressions on codegen quality: Sometimes
isel uses ADD instead of ADDA even the latter can lead to shorter
sequence of code. This issue implies that some isel patterns might need
to be updated.
2021-08-07 17:19:12 -07:00
Craig Topper d4ee84ceee [RISCV] Support FP_TO_S/UINT_SAT for i32 and i64.
The fcvt fp to integer instructions saturate if their input is
infinity or out of range, but the instructions produce a maximum
integer for nan instead of 0 required for the ISD opcodes.

This means we can use the instructions to do the saturating
conversion, but we'll need to fix up the nan case at the end.

We can probably improve the i8 and i16 default codegen as well,
but I'll leave that for a follow up.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D107230
2021-08-07 16:06:00 -07:00
Nikita Popov 88003cea1c [MemCpyOpt] Remove MemDepAnalysis-based implementation
The MemorySSA-based implementation has been enabled for a few months
(since D94376). This patch drops the old MDA-based implementation
entirely.

I've kept this to only the basic cleanup of dropping various
conditions -- the code could be further cleaned up now that there
is only one implementation.

Differential Revision: https://reviews.llvm.org/D102113
2021-08-07 22:35:44 +02:00
Krishna a9a176ca3b [InstCombine] Remove nnan requirement for transformation to fabs from select
In this patch, the "nnan" requirement is removed for the canonicalization of select with fcmp to fabs.
(i) FSub logic: Remove check for nnan flag presence in fsub. Example: https://alive2.llvm.org/ce/z/751svg (fsub).
(ii) FNeg logic: Remove check for the presence of nnan and nsz flag in fneg. Example: https://alive2.llvm.org/ce/z/a_fsdp (fneg).

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D106872
2021-08-07 22:38:45 +05:30
Craig Topper 24dfba8d50 [X86] Teach shouldSinkOperands to recognize pmuldq/pmuludq patterns.
The IR for pmuldq/pmuludq intrinsics uses a sext_inreg/zext_inreg
pattern on the inputs. Ideally we pattern match these away during
isel. It is possible for LICM or other middle end optimizations
to separate the extend from the mul. This prevents SelectionDAG
from removing it or depending on how the extend is lowered, we
may not be able to generate an AssertSExt/AssertZExt in the
mul basic block. This will prevent pmuldq/pmuludq from being
formed at all.

This patch teaches shouldSinkOperands to recognize this so
that CodeGenPrepare will clone the extend into the same basic
block as the mul.

Fixes PR51371.

Differential Revision: https://reviews.llvm.org/D107689
2021-08-07 08:45:56 -07:00
Roman Lebedev 0a241e90d4
[NFC][InstCombine] `vector_reduce_xor(?ext(<n x i1>))` --> `?ext(vector_reduce_add(<n x i1>))`
Instead of expanding it ourselves,
we can just forward to `?ext(vector_reduce_add(<n x i1>))`, as per alive2:
https://alive2.llvm.org/ce/z/ymz7zE (self)
https://alive2.llvm.org/ce/z/eKu2v2 (skipped zext)
https://alive2.llvm.org/ce/z/c3BXgc (skipped sext)
2021-08-07 17:31:33 +03:00
Roman Lebedev c6ff867f92
[NFC][InstCombine] Simplify emitted IR for `vector_reduce_xor(?ext(<n x i1>))`
Now that we canonicalize low bit splatting to the form we were emitting
here ourselves, emit simpler IR that will be canonicalized later.

See 1e801439be for proofs:
https://alive2.llvm.org/ce/z/MjCm5W (self)
https://alive2.llvm.org/ce/z/kgqF4M (skipped zext)
https://alive2.llvm.org/ce/z/pgy3HP (skipped sext)
2021-08-07 17:31:24 +03:00
Roman Lebedev e71870512f
[InstCombine] Prefer `-(x & 1)` as the low bit splatting pattern (PR51305)
Both patterns are equivalent (https://alive2.llvm.org/ce/z/jfCViF),
so we should have a preference. It seems like mask+negation is better
than two shifts.
2021-08-07 17:25:28 +03:00
Christudasan Devadasan ffc3fb665d SROA: Enhance speculateSelectInstLoads
Allow the folding even if there is an
intervening bitcast.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D106667
2021-08-07 09:09:14 -04:00
Florian Hahn a00aafc30d
[VPlan] Iterate over phi recipes to detect reductions to fix.
After refactoring the phi recipes, we can now iterate over all header
phis in a VPlan to detect reductions when it comes to fixing them up
when tail folding.

This reduces the coupling with the cost model & legal by using the
information directly available in VPlan. It also removes a call to
getOrAddVPValue, which references the original IR value which may
become outdated after VPlan transformations.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D100102
2021-08-07 14:06:50 +01:00
Amara Emerson 4c2e01232c [GlobalISel] Fix a combine causing DBG_VALUE with dangling vregs.
We should use MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval()
instead of eraseFromParent().

We should probably use that in other places too but fix this issue which
affects clang bootstrap builds for now.
2021-08-07 01:41:02 -07:00
Nemanja Ivanovic 62fe3dcf98 Fix PPC buildbot break caused by 4c4093e6e3
This commit adds the isnan intrinsic and provides a default expansion
for it in the SDAG. However, it makes the assumption that types
it operates on are IEEE-compliant types. This is not always the case.
An example of that is PPC "double double" which has a representation
that
- Does not need to conform to IEEE requirements for isnan as it is
  not an IEEE-compliant type
- Does not have a representation that allows for straightforward
  reinterpreting as an integer and use of integer operations

The result was that this commit broke __builtin_isnan for ppc_fp128
making many valid numeric values report a NaN.

This patch simply changes the expansion to always expand to unordered
comparison (regardless of whether FP exceptions are tracked). This
is inline with previous semantics.
2021-08-06 22:10:20 -05:00
Steffen Larsen 1b4c85fc02 [NVPTX] Add NVPTX intrinsics for CUDA PTX 6.5 ldmatrix instructions
Adds NVPTX intrinsics for the CUDA PTX `ldmatrix.sync.aligned` instructions added in PTX 6.5.

PTX ISA description of `ldmatrix.sync.aligned`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-ldmatrix

Authored-by: Steffen Larsen <steffen.larsen@codeplay.com>

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D107046
2021-08-06 16:13:35 -07:00
Sanjay Patel 0369714b31 [InstCombine] reduce vector casting before icmp
There may be some generalizations (see test comments) of these patterns,
but this should handle the cases motivated by:
https://llvm.org/PR51315
https://llvm.org/PR51259

The backend may want to transform differently, but at least for
the x86 examples that I looked at, there does not appear to be
any significant perf diff either way.
2021-08-06 17:09:38 -04:00
Michael Liao 05783e1cfe [amdgpu] Revise the conversion from i64 to f32.
- Replace 'cmp+sel' with 'umin' if possible.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D107507
2021-08-06 17:01:47 -04:00
Amara Emerson 2b067e3335 Change TargetLowering::canMergeStoresTo() to take a MF instead of DAG.
DAG is unnecessary and we need this hook to implement store merging on GlobalISel too.
2021-08-06 12:57:53 -07:00
Thomas Johnson f8a4495149 [ARC] Add codegen for llvm.ctlz intrinsic for the ARC backend
Differential Revision: https://reviews.llvm.org/D107611
2021-08-06 12:18:06 -07:00
Artem Belevich 6a9cf21f5a [CUDA, MemCpyOpt] Add a flag to force-enable memcpyopt and use it for CUDA.
Attempt to enable MemCpyOpt unconditionally in D104801 uncovered the fact that
there are users that do not expect LLVM to materialize `memset` intrinsic.

While other passes can do that, too, MemCpyOpt triggers it more frequently and
breaks sanitizers and some downstream users.

For now introduce a flag to force-enable the flag and opt-in only CUDA
compilation with NVPTX back-end.

Differential Revision: https://reviews.llvm.org/D106401
2021-08-06 11:13:52 -07:00
Zheng Chen 30b0c455b1 [LoopCacheAnalysis]: handle mismatch type for Numerator and CacheLineSize
fix an assertion due to mismatch type for Numerator and CacheLineSize in loop cache analysis pass.

Reviewed By: bmahjour

Differential Revision: https://reviews.llvm.org/D107618
2021-08-06 16:51:09 +00:00
Jon Roelofs eae4a44c1d [GlobalISel][KnownBits] Implement G_CTPOP
Implementation copied almost verbatim from ValueTracking.

Differential revision: https://reviews.llvm.org/D107606
2021-08-06 09:48:39 -07:00
Michael Liao d1cacd5928 [MemCpyOpt] Teach memcpyopt to handle loads from the constant memory.
- Loads from the constant memory (either explicit one or as the source
  of memory transfer intrinsics) won't alias any stores.

Reviewed By: asbirlea, efriedma

Differential Revision: https://reviews.llvm.org/D107605
2021-08-06 12:43:52 -04:00
Craig Topper b2ca4dc935 [LegalizeTypes] Add a simple expansion for SMULO when a libcall isn't available.
This isn't optimal, but prevents crashing when the libcall isn't
available. It just calculates the full product and makes sure the high bits
match the sign of the low half. Each of the pieces should go through their own
type legalization.

This can make D107420 unnecessary.

Needs tests, but I wanted to start discussion about D107420.

Reviewed By: FreddyYe

Differential Revision: https://reviews.llvm.org/D107581
2021-08-06 09:43:01 -07:00
David Green 77e8f4eeee [ARM] Define ComplexPatternFuncMutatesDAG
Some of the Arm complex pattern functions call canExtractShiftFromMul,
which can modify the DAG in-place. For this to be valid and handled
successfully we need to define ComplexPatternFuncMutatesDAG.

Differential Revision: https://reviews.llvm.org/D107476
2021-08-06 17:35:11 +01:00
Kazu Hirata 276be84d0a [CodeGen] Remove computeDefOperandLatency (NFC)
The last use was removed on Oct 9, 2016 in commit
5c924d7117.
2021-08-06 08:26:55 -07:00
Jay Foad 57b9107e3f [GlobalISel] Improve widening of cttz/cttz_zero_undef
Differential Revision: https://reviews.llvm.org/D107631
2021-08-06 14:25:56 +01:00
Mircea Trofin ae1a2a09e4 [NFC][MLGO] Make logging more robust
1) add some self-diagnosis (when asserts are enabled) to check that all
features have the same nr of entries

2) avoid storing pointers to mutable fields because the proto API
contract doesn't actually guarantee those stay fixed even if no further
mutation of the object occurs.

Differential Revision: https://reviews.llvm.org/D107594
2021-08-06 04:44:52 -07:00
Reshabh Sharma 5173854f19 [AMDGPU] Handle functions in llvm's global ctors and dtors list
This patch introduces a new code object metadata field, ".kind"
which is used to add support for init and fini kernels.

HSAStreamer will use function attributes, "device-init" and
"device-fini" to distinguish between init and fini kernels from
the regular kernels and will emit metadata with ".kind" set to
"init" and "fini" respectively.

To reduce the number of init and fini kernels, the ctors and
dtors present in the llvm's global.ctors and global.dtors lists
are called from a single init and fini kernel respectively.

Reviewed by: yaxunl

Differential Revision: https://reviews.llvm.org/D105682
2021-08-06 15:53:33 +05:30
Simon Pilgrim dbce6a8d9d [ARM] Fold insert_subvector to concat_vectors
D107068 fixed the same problem on aarch64 but the arm variant wasn't exposed in existing test coverage.

I've copied the arm64-neon-copy tests (and stripped the intrinsic test from it) for testing on arm neon builds as well.
2021-08-06 11:21:31 +01:00
Simon Pilgrim 18e6a03b1a [X86][AVX] Extract SUBV_BROADCAST constant bits from just the lower subvector range (PR51281)
As reported on PR51281, an internal fuzz test encountered an issue when extracting constant bits from a SUBV_BROADCAST node from a constant pool source larger than the broadcasted subvector width.

The getTargetConstantBitsFromNode was assuming that the Constant would the same size as the subvector, resulting in the incorrect packing of the per-element bits data.

This patch attempts to solve this by using the SUBV_BROADCAST node to determine the subvector width, and then ensuring we extract only the lowest bits from Constant of that subvector bitsize.

Differential Revision: https://reviews.llvm.org/D107158
2021-08-06 11:21:31 +01:00
Cullen Rhodes 08bc441174 [AArch64] NFC: drop unnecessary llvm:: namespace prefix on MCInst 2021-08-06 09:23:18 +00:00
David Sherwood 3fd96e1b2e [LoopVectorize] Improve vectorisation of some intrinsics by treating them as uniform
This patch adds more instructions to the Uniforms list, for example certain
intrinsics that are uniform by definition or whose operands are loop invariant.
This list includes:

  1. The intrinsics 'experimental.noalias.scope.decl' and 'sideeffect', which
  are always uniform by definition.
  2. If intrinsics 'lifetime.start', 'lifetime.end' and 'assume' have
  loop invariant input operands then these are also uniform too.

Also, in VPRecipeBuilder::handleReplication we check if an instruction is
uniform based purely on whether or not the instruction lives in the Uniforms
list. However, there are certain cases where calls to some intrinsics can
be effectively treated as uniform too. Therefore, we now also treat the
following cases as uniform for scalable vectors:

  1. If the 'assume' intrinsic's operand is not loop invariant, then we
  are free to treat this as uniform anyway since it's only a performance
  hint. We will get the benefit for the first lane.
  2. When the input pointers for 'lifetime.start' and 'lifetime.end' are loop
  variant then for scalable vectors we assume these still ultimately come
  from the broadcast of an alloca. We do not support scalable vectorisation
  of loops containing alloca instructions, hence the alloca itself would
  be invariant. If the pointer does not come from an alloca then the
  intrinsic itself has no effect.

I have updated the assume test for fixed width, since we now treat it
as uniform:

  Transforms/LoopVectorize/assume.ll

I've also added new scalable vectorisation tests for other intriniscs:

  Transforms/LoopVectorize/scalable-assume.ll
  Transforms/LoopVectorize/scalable-lifetime.ll
  Transforms/LoopVectorize/scalable-noalias-scope-decl.ll

Differential Revision: https://reviews.llvm.org/D107284
2021-08-06 10:13:15 +01:00
Chuanqi Xu 0fd03feb4b [FuncSpec] Return changed if function is changed by tryToReplaceWithConstant
The may get changed before specialization by RunSCCPSolver. In other
words, the pass may change the function without specialization happens.
Add test and comment to reveal this.
And it may return No Changed if the function get changed by
RunSCCPSolver before the specialization. It looks like a potential bug.

Test Plan: check-all

Reviewed By: https://reviews.llvm.org/D107622

Differential Revision: https://reviews.llvm.org/D107622
2021-08-06 17:00:17 +08:00
David Sherwood 43a5c750d1 Revert "[LoopVectorize] Add support for replication of more intrinsics with scalable vectors"
This reverts commit 95800da914.
2021-08-06 09:48:16 +01:00
Jay Foad 83610d4eb0 [AMDGPU][GlobalISel] Better legalization of 32-bit ctlz/cttz
Differential Revision: https://reviews.llvm.org/D107474
2021-08-06 09:40:48 +01:00
Jay Foad 24b67a9024 [AMDGPU][GlobalISel] Improve regbankselect for 64-bit VGPR ctlz_zero_undef/cttz_zero_undef
We can improve on the generic splitting by using ffbh/ffbl, which have a
defined result when the input is zero.

Differential Revision: https://reviews.llvm.org/D107442
2021-08-06 09:40:48 +01:00
Jay Foad d77b43c385 [AMDGPU][GlobalISel] Add G_AMDGPU_FFBL_B32
This is the counterpart to G_AMDGPU_FFBH_U32 which already exists. These
instructions have a defined result of -1 when the input is zero.

Differential Revision: https://reviews.llvm.org/D107441
2021-08-06 09:40:48 +01:00
Jay Foad cd2594e1c6 [GlobalISel] Improve legalization of narrow CTTZ
Differential Revision: https://reviews.llvm.org/D107457
2021-08-06 09:40:48 +01:00
Chuanqi Xu 62fc3e0ad6 [NFC] [FuncSpec] Remove unused variables in isArgumentInteresting 2021-08-06 16:38:20 +08:00
Chuanqi Xu cc3f40bb41 [FuncSpec] Move invariant computation for spec cost out of loop (NFC-ish)
Noticed that the computation for function specialization cost of a
function wouldn't change during the traversal of the arguments for the
function. We could hoist the computation out of the traversal. I
observed about ~1% improvement on compile time for spec2017. But I guess
it may not be precise. This should be NFC and fine.

Reviewed By: Sjoerd Meijer

Differential Revision: https://reviews.llvm.org/D107621
2021-08-06 15:43:05 +08:00
Serge Pavlov 4c4093e6e3 Introduce intrinsic llvm.isnan
This is recommit of the patch 16ff91ebcc,
reverted in 0c28a7c990 because it had
an error in call of getFastMathFlags (base type should be FPMathOperator
but not Instruction). The original commit message is duplicated below:

    Clang has builtin function '__builtin_isnan', which implements C
    library function 'isnan'. This function now is implemented entirely in
    clang codegen, which expands the function into set of IR operations.
    There are three mechanisms by which the expansion can be made.

    * The most common mechanism is using an unordered comparison made by
      instruction 'fcmp uno'. This simple solution is target-independent
      and works well in most cases. It however is not suitable if floating
      point exceptions are tracked. Corresponding IEEE 754 operation and C
      function must never raise FP exception, even if the argument is a
      signaling NaN. Compare instructions usually does not have such
      property, they raise 'invalid' exception in such case. So this
      mechanism is unsuitable when exception behavior is strict. In
      particular it could result in unexpected trapping if argument is SNaN.

    * Another solution was implemented in https://reviews.llvm.org/D95948.
      It is used in the cases when raising FP exceptions by 'isnan' is not
      allowed. This solution implements 'isnan' using integer operations.
      It solves the problem of exceptions, but offers one solution for all
      targets, however some can do the check in more efficient way.

    * Solution implemented by https://reviews.llvm.org/D96568 introduced a
      hook 'clang::TargetCodeGenInfo::testFPKind', which injects target
      specific code into IR. Now only SystemZ implements this hook and it
      generates a call to target specific intrinsic function.

    Although these mechanisms allow to implement 'isnan' with enough
    efficiency, expanding 'isnan' in clang has drawbacks:

    * The operation 'isnan' is hidden behind generic integer operations or
      target-specific intrinsics. It complicates analysis and can prevent
      some optimizations.

    * IR can be created by tools other than clang, in this case treatment
      of 'isnan' has to be duplicated in that tool.

    Another issue with the current implementation of 'isnan' comes from the
    use of options '-ffast-math' or '-fno-honor-nans'. If such option is
    specified, 'fcmp uno' may be optimized to 'false'. It is valid
    optimization in general, but it results in 'isnan' always returning
    'false'. For example, in some libc++ implementations the following code
    returns 'false':

        std::isnan(std::numeric_limits<float>::quiet_NaN())

    The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
    operands are never NaNs. This assumption however should not be applied
    to the functions that check FP number properties, including 'isnan'. If
    such function returns expected result instead of actually making
    checks, it becomes useless in many cases. The option '-ffast-math' is
    often used for performance critical code, as it can speed up execution
    by the expense of manual treatment of corner cases. If 'isnan' returns
    assumed result, a user cannot use it in the manual treatment of NaNs
    and has to invent replacements, like making the check using integer
    operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
    which also expresses the opinion, that limitations imposed by
    '-ffast-math' should be applied only to 'math' functions but not to
    'tests'.

    To overcome these drawbacks, this change introduces a new IR intrinsic
    function 'llvm.isnan', which realizes the check as specified by IEEE-754
    and C standards in target-agnostic way. During IR transformations it
    does not undergo undesirable optimizations. It reaches instruction
    selection, where is lowered in target-dependent way. The lowering can
    vary depending on options like '-ffast-math' or '-ffp-model' so the
    resulting code satisfies requested semantics.

    Differential Revision: https://reviews.llvm.org/D104854
2021-08-06 14:32:27 +07:00
Florian Hahn 3e58dd19df
[LV] Move reduction PHI node fixup to VPlan::execute (NFC).
All information to fix-up the reduction phi nodes in the vectorized loop
is available in VPlan now. This patch moves the code to do so, to make
this clearer. Fixing up the loop exit value still relies on other
information and remains outside of VPlan for now.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D100113
2021-08-06 08:29:20 +01:00
Chuanqi Xu 82ca845b47 [NFC] [FuncSpec] Update the Todo list for recursive functions
Now the recursive functions may get specialized many times when
`func-specialization-max-iters` increases. See discussion in
https://reviews.llvm.org/D106426 for details.
2021-08-06 14:43:17 +08:00
Amara Emerson 4fee756c75 Delete copy-ctor of MachineFrameInfo.
I just hit a nasty bug when writing a unit test after calling MF->getFrameInfo()
without declaring the variable as a reference.

Deleting the copy-constructor also showed a place in the ARM backend which was
doing the same thing, albeit it didn't impact correctness there from the looks of it.
2021-08-05 23:24:37 -07:00
Kai Luo 666ee849f0 [PowerPC] Fix shift amount of xxsldwi when performing vector int_to_double
POC
```
// main.c
#include <stdio.h>
#include <altivec.h>
extern vector double foo(vector int s);
int main() {
  vector int s = {0, 1, 0, 4};
  vector double vd;
  vd = foo(s);
  printf("%lf %lf\n", vd[0], vd[1]);
  return 0;
}
// poc.c
vector double foo(vector int s) {
  int x1 = s[1];
  int x3 = s[3];
  double d1 = x1;
  double d3 = x3;
  vector double x = { d1, d3 };
  return x;
}
```
Compiled with `poc.c main.c -mcpu=pwr8 -O3` on BE machine.
Current clang gives
```
4.000000 1.000000
```
while xlc gives
```
1.000000 4.000000
```
Xlc's output should be correct.

Reviewed By: shchenz, #powerpc

Differential Revision: https://reviews.llvm.org/D107428
2021-08-06 06:01:29 +00:00
Arthur Eubanks a1b21ed3fb [GCov] Emit memset instead of stores in __llvm_gcov_reset
For a very large module, __llvm_gcov_reset can become very large.
__llvm_gcov_reset previously emitted stores to a bunch of globals in one
huge basic block. MemCpyOpt would turn many of these stores into
memsets, and updating MemorySSA would be extremely slow.

Verified that this makes the compile time of certain files go down
drastically (20min -> 5min).

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D107538
2021-08-05 22:40:15 -07:00
Serge Bazanski 7ece20505f [Lanai] fix lowering wide returns
This implements LanaiTargetLowering::CanLowerReturn, thereby ensuring
all return values conform to the RetCC and get sret-demoted as
necessary.

A regression test is also added that exercises this functionality.

Reviewed By: jpienaar

Differential Revision: https://reviews.llvm.org/D107086
2021-08-05 21:08:09 -07:00
Jinsong Ji 6f84d94b9c [PowerPC] Fix copy/paste error in scalar_to_vector patterns
https://reviews.llvm.org/D100478 refactoring added a copy/paste error
for v8i16 patterns.

Reviewed By: #powerpc, shchenz

Differential Revision: https://reviews.llvm.org/D107609
2021-08-06 02:59:01 +00:00
Jessica Paquette e6a3944ea9 [AArch64][GlobalISel] Overhaul G_INSERT legalization
Similar cleanup to G_EXTRACT (51bd4e874f).

Also swap the order of clamp/widen to avoid unnecessary complex merges.

Add a bunch of missing testcases to legalize-inserts while we're at it.

Differential Revision: https://reviews.llvm.org/D107601
2021-08-05 18:28:22 -07:00
Jessica Paquette 562c8e14d9 [AArch64][GlobalISel] Widen G_IMPLICIT_DEF and G_FREEZE before clamping
Similar to other cleanup commits which widen instructions before clamping
during legalization. Purpose of this is to avoid weird type breakdowns.

In terms of G_IMPLICIT_DEF, this simplifies legalization for other instructions.
The legalizer has to emit G_IMPLICIT_DEF to legalize certain instructions, so
this can help with emitting merges elsewhere.

Differential Revision: https://reviews.llvm.org/D107604
2021-08-05 18:21:14 -07:00
Sean Fertile 23651c5ae0 [PowerPC][AIX] Create multiple constant sections.
Fixes issue where late materialized constants can be more strictly
aligned then their containing csect.

Differential Revision: https://reviews.llvm.org/D103103
2021-08-05 21:19:16 -04:00
Jon Roelofs 5fc7b1a260 Revert "[GlobalISel][KnownBits] Implement G_CTPOP"
This reverts commit ce6eb4f15a.

It's broken on the windows bots: https://reviews.llvm.org/D107606#2930121
2021-08-05 17:47:47 -07:00
Jon Roelofs ce6eb4f15a [GlobalISel][KnownBits] Implement G_CTPOP
Implementation copied almost verbatim from ValueTracking.

Differential revision: https://reviews.llvm.org/D107606
2021-08-05 17:17:29 -07:00
Ryan Prichard 623cf3dfdf Mark getc_unlocked as unavailable by default
Before D45736, getc_unlocked was available by default, but turned off
for non-Cygwin/non-MinGW Windows. D45736 then added 9 more unlocked
functions, which were unavailable by default, but it also:
 * left getc_unlocked enabled by default,
 * removed the disabling line for Windows, and
 * added code to enable getc_unlocked for GNU, Android, and OSX.

For consistency, make getc_unlocked unavailable by default. Maybe this
was the intent of D45736 anyway.

Reviewed By: MaskRay, efriedma

Differential Revision: https://reviews.llvm.org/D107527
2021-08-05 16:35:02 -07:00
Jessica Paquette 8a557d8311 [AArch64][GlobalISel] Widen extloads before clamping during legalization
Allows us to avoid awkward type breakdowns on types like s88, like the other
commits.

Differential Revision: https://reviews.llvm.org/D107587
2021-08-05 16:14:06 -07:00
Stanislav Mekhanoshin d71924fbfe [AMDGPU] Improve v2i32/v2f32 insertelt patterns
Using REG_SEQUENCE produces better code than INSERT_SUBREG,
we can omit one move instruction in many cases.

Fixes: SWDEV-298028

Differential Revision: https://reviews.llvm.org/D107602
2021-08-05 16:13:39 -07:00
Heejin Ahn 41ba39dfcd [WebAssembly] Don't do SjLj transformation when there's only setjmp
When there is a `setjmp` call in a function, we transform every callsite
of `setjmp` to record its information by calling `saveSetjmp` function,
and we also transform every callsite of a function that can longjmp to
to check if a longjmp occurred and if so jump to the corresponding
post-setjmp BB. Currently we are doing this for every function that
contains a call to `setjmp`, but if there is no other function call
within that function that can longjmp, this transformation of `setjmp`
callsite and all the preparation of `setjmpTable` in the entry of the
function are not necessary.

This checks if a setjmp-calling function has any other calls that can
longjmp, and if not, skips the function for the SjLj transformation.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D107530
2021-08-05 15:28:02 -07:00
David Green 649cf4514d [AArch64] Expand the SVE min/max reduction costs to NEON
This takes the existing SVE costing for the various min/max reduction
intrinsics and expands it to NEON, where I believe it applies equally
well.

In the process it changes the lowering to use min/max cost, as opposed
to summing up the cost of ICmp+Select.

Differential Revision: https://reviews.llvm.org/D106239
2021-08-05 23:23:24 +01:00
Jessica Paquette 36498374d4 [AArch64][GlobalISel] Widen G_BSWAP before clamping
This allows us to avoid odd type breakdowns + allows us to legalize types like
s88 in the first place.

Add some testcases for known legal types + testcases for s4 and s88.

Differential Revision: https://reviews.llvm.org/D107607
2021-08-05 15:16:00 -07:00
Jessica Paquette 51bd4e874f [AArch64][GlobalISel] Overhaul G_EXTRACT legalization
This simplifies our existing G_EXTRACT rules and adds some test coverage. Mostly
changing this because it should make it easier to improve legalization for
instructions which use G_EXTRACT as part of the legalization process.

This also adds support for legalizing some weird types. Similar to other recent
legalizer changes, this changes the order of widening/clamping.

There was some dead code in our existing rules (e.g. the p0 case would never get
hit), so this knocks those out and makes the types we want to handle explicit.

This also removes some checks which, nowadays, are handled by the
MachineVerifier.

Differential Revision: https://reviews.llvm.org/D107505
2021-08-05 13:55:15 -07:00
Jon Roelofs 98f38c151b [AArch64][GlobalISel] Legalize ctpop s128
This is re-landing the same patch again, but without the changes to
LegalizerHelper that regressed the Mips test:

test/CodeGen/Mips/GlobalISel/llvm-ir/ctpop.ll

Differential revision: https://reviews.llvm.org/D106494
2021-08-05 11:54:53 -07:00
Chris Jackson 113a06f7a5 {DebugInfo][LSR] Don't cache dbg.value that are already undef
The SCEV-based salvaging method caches dbg.value information pre-LSR so
that salvaging may be attempted post-LSR. If the dbg.value are already
undef pre-LSR then a salvage attempt would be fruitless, so avoid
caching them.

Reviewed By: StephenTozer

Differential Revision: https://reviews.llvm.org/D107448
2021-08-05 19:16:43 +01:00
Roman Lebedev c0586ff05d
[NFC][X86] combineX86ShuffleChain(): hoist Mask variable higher up
Having `NewMask` outside of an if and rebinding `BaseMask` `ArrayRef`
to it is confusing. Instead, just move the `Mask` vector higher up,
and change the code that earlier had no access to it but now does
to use `Mask` instead of `BaseMask`.

This has no other intentional changes.

This is a recommit of 35c0848b57,
that was reverted to simplify reversion of an earlier change.
2021-08-05 20:37:51 +03:00
Ramesh Peri 976bd23612 [llvm-ar] Fix for handling thin archive with SYM64 and a test case for it
WHen thin archives are created which have symbol table of type SYM64 then all the tools will not work since they cannot read the files properly.
One can reproduce the problem as follows:
1. Take a hello world program and create an archive out of it. The SYM64_THRESHOLD=0 will force the generation of SYM64 symbol table.
    clang -c hello.cpp
    SYM64_THRESHOLD=0 llvm-ar crsT mylib.a hello.o
2. Now try to use any of the tools on this mylib.a and it will fail.
    llvm-nm -M mylib.a

THis fix will eliminate these failures. A regression test is created in llvm/test/Object/archive-symtab.test

Reviewed By: MaskRay, Ramesh

Differential Revision: https://reviews.llvm.org/D107322
2021-08-05 10:06:34 -07:00
Benjamin Kramer bd17ced1db Revert "[X86] combineX86ShuffleChain(): canonicalize mask elts picking from splats"
This reverts commits f819e4c7d0 and
35c0848b57. It triggers an infinite loop during
compilation.

$ cat t.ll
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define void @MaxPoolGradGrad_1.65() local_unnamed_addr #0 {
entry:
  %wide.vec78 = load <64 x i32>, <64 x i32>* null, align 16
  %strided.vec83 = shufflevector <64 x i32> %wide.vec78, <64 x i32> poison, <8 x i32> <i32 4, i32 12, i32 20, i32 28, i32 36, i32 44, i32 52, i32 60>
  %0 = lshr <8 x i32> %strided.vec83, <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>
  %1 = add <8 x i32> zeroinitializer, %0
  %2 = shufflevector <8 x i32> %1, <8 x i32> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
  %3 = shufflevector <16 x i32> %2, <16 x i32> undef, <32 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
  %interleaved.vec = shufflevector <32 x i32> undef, <32 x i32> %3, <64 x i32> <i32 0, i32 8, i32 16, i32 24, i32 32, i32 40, i32 48, i32 56, i32 1, i32 9, i32 17, i32 25, i32 33, i32 41, i32 49, i32 57, i32 2, i32 10, i32 18, i32 26, i32 34, i32 42, i32 50, i32 58, i32 3, i32 11, i32 19, i32 27, i32 35, i32 43, i32 51, i32 59, i32 4, i32 12, i32 20, i32 28, i32 36, i32 44, i32 52, i32 60, i32 5, i32 13, i32 21, i32 29, i32 37, i32 45, i32 53, i32 61, i32 6, i32 14, i32 22, i32 30, i32 38, i32 46, i32 54, i32 62, i32 7, i32 15, i32 23, i32 31, i32 39, i32 47, i32 55, i32 63>
  store <64 x i32> %interleaved.vec, <64 x i32>* undef, align 16
  unreachable
}

$ llc < t.ll -mcpu=skylake
<hang>
2021-08-05 18:58:08 +02:00
Jessica Paquette f3f3098afe [AArch64][GlobalISel] Mark v16s8 <- v8s8, v8s8 G_CONCAT_VECTOR as legal
G_CONCAT_VECTORS shows up from time to time when legalizing other instructions.

We actually import patterns for the v16s8 <- v8s8, v8s8 case so marking it
as legal gives us selection for free.

Differential Revision: https://reviews.llvm.org/D107512
2021-08-05 09:40:46 -07:00
Kazu Hirata 72661f337a [Transforms] Drop unnecessary const from return types (NFC)
Identified with readability-const-return-type.
2021-08-05 08:53:17 -07:00
Alexey Bataev e7c3eaa8ae [SLP]Do not emit extra shuffle for insertelements vectorization.
If the vectorized insertelements instructions form indentity subvector
(the subvector at the beginning of the long vector), it is just enough
to extend the vector itself, no need to generate inserting subvector
shuffle.

Differential Revision: https://reviews.llvm.org/D107494
2021-08-05 08:41:24 -07:00
Craig Topper f7076cfd3a [DAGCombiner][RISCV][AMDGPU] Call SimplifyDemandedBits at the end of visitMULHU to enable known bits contant folding.
We don't have real demanded bits support for MULHU, but we can
still use the known bits based constant folding support at the end
of SimplifyDemandedBits to simplify a MULHU. This helps with cases
where we know the LHS and RHS have enough leading zeros so that
the high multiply result is always 0.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D106471
2021-08-05 08:31:26 -07:00
David Sherwood e9177b0958 Fix build issues caused by 95800da914 2021-08-05 16:26:34 +01:00
Sander de Smalen 3e47f009ff [LV] Consider ExtractValue as uniform.
Since all operands to ExtractValue must be loop-invariant when we deem
the loop vectorizable, we can consider ExtractValue to be uniform.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D107286
2021-08-05 16:20:50 +01:00
eopXD fd7f6a3c81 [NFC][LoopIdiom] rename boolean variable NegStride to IsNegStride
Rename variable for better code readability.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D107570
2021-08-05 23:11:42 +08:00
Jay Foad 2b63933115 [AMDGPU][SDag] Better lowering for 32-bit ctlz/cttz
Differential Revision: https://reviews.llvm.org/D107566
2021-08-05 15:57:40 +01:00
Jay Foad e6c364a624 [AMDGPU][SDag] Better lowering for 64-bit ctlz/cttz
Differential Revision: https://reviews.llvm.org/D107546
2021-08-05 15:57:40 +01:00
Momchil Velikov f171149e0d [SimpifyCFG] Speculate a store preceded by a local non-escaping load
In SimplifyCFG we may simplify the CFG by speculatively executing
certain stores, when they are preceded by a store to the same
location.  This patch allows such speculation also when the stores are
similarly preceded by a load.

In order for this transformation to be correct we need to ensure that
the memory location is writable and the store in the new location does
not introduce a data race.

Local objects (created by an `alloca` instruction) are always
writable, so once we are past a read from a location it is valid to
also write to that same location.

Seeing just a load does not guarantee absence of a data race (unlike
if we see a store) - the load may still be part of a race, just not
causing undefined behaviour
(cf. https://llvm.org/docs/Atomics.html#optimization-outside-atomic).

In the original program, a data race might have been prevented by the
condition, but once we move the store outside the condition, we must
be sure a data race wasn't possible anyway, no matter what the
condition evaluates to.

One way to be sure that a local object is never concurrently
read/written is check that its address never escapes the function.

Hence this transformation is restricted to local, non-escaping
objects.

Reviewed By: nikic, lebedev.ri

Differential Revision: https://reviews.llvm.org/D107281
2021-08-05 15:54:42 +01:00
Simon Pilgrim 2cbf9fd402 [DAG] DAGCombiner::visitVECTOR_SHUFFLE - recognise INSERT_SUBVECTOR patterns
IR typically creates INSERT_SUBVECTOR patterns as a widening of the subvector with undefs to pad to the destination size, followed by a shuffle for the actual insertion - SelectionDAGBuilder has to do something similar for shuffles when source/destination vectors are different sizes.

This combine attempts to recognize these patterns by looking for a shuffle of a subvector (from a CONCAT_VECTORS) that starts at a modulo of its size into an otherwise identity shuffle of the base vector.

This uncovered a couple of target-specific issues as we haven't often created INSERT_SUBVECTOR nodes in generic code - aarch64 could only handle insertions into the bottom of undefs (i.e. a vector widening), and x86-avx512 vXi1 insertion wasn't keeping track of undef elements in the base vector.

Fixes PR50053

Differential Revision: https://reviews.llvm.org/D107068
2021-08-05 15:40:48 +01:00
Florian Hahn 38b098be66
[VectorCombine] Limit scalarization known non-poison indices.
We can only trust the range of the index if it is guaranteed
non-poison.

Fixes PR50949.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D107364
2021-08-05 15:36:31 +01:00
Dawid Jurczak 06206a8cd1 [BuildLibCalls][NFC] Remove redundant attribute list from emitCalloc
Additionally with this patch aligned DSE which is the only user of emitCalloc.

Differential Revision: https://reviews.llvm.org/D103523
2021-08-05 16:18:38 +02:00
David Sherwood 95800da914 [LoopVectorize] Add support for replication of more intrinsics with scalable vectors
This patch adds more instructions to the Uniforms list, for example certain
intrinsics that are uniform by definition or whose operands are loop invariant.
This list includes:

  1. The intrinsics 'experimental.noalias.scope.decl' and 'sideeffect', which
  are always uniform by definition.
  2. If intrinsics 'lifetime.start', 'lifetime.end' and 'assume' have
  loop invariant input operands then these are also uniform too.

Also, in VPRecipeBuilder::handleReplication we check if an instruction is
uniform based purely on whether or not the instruction lives in the Uniforms
list. However, there are certain cases where calls to some intrinsics can
be effectively treated as uniform too. Therefore, we now also treat the
following cases as uniform for scalable vectors:

  1. If the 'assume' intrinsic's operand is not loop invariant, then we
  are free to treat this as uniform anyway since it's only a performance
  hint. We will get the benefit for the first lane.
  2. When the input pointers for 'lifetime.start' and 'lifetime.end' are loop
  variant then for scalable vectors we assume these still ultimately come
  from the broadcast of an alloca. We do not support scalable vectorisation
  of loops containing alloca instructions, hence the alloca itself would
  be invariant. If the pointer does not come from an alloca then the
  intrinsic itself has no effect.

I have updated the assume test for fixed width, since we now treat it
as uniform:

  Transforms/LoopVectorize/assume.ll

I've also added new scalable vectorisation tests for other intriniscs:

  Transforms/LoopVectorize/scalable-assume.ll
  Transforms/LoopVectorize/scalable-lifetime.ll
  Transforms/LoopVectorize/scalable-noalias-scope-decl.ll

Differential Revision: https://reviews.llvm.org/D107284
2021-08-05 15:17:27 +01:00
Dawid Jurczak f8cdde7195 [SimplifyLibCalls][NFC] Clean up LibCallSimplifier from 'memset + malloc into calloc' transformation
FoldMallocMemset can be safely removed because since https://reviews.llvm.org/D103009
such transformation is already performed in DSE.

Differential Revision: https://reviews.llvm.org/D103451
2021-08-05 16:08:32 +02:00
Krzysztof Parzyszek d0c3b61498 Delay initialization of OptBisect
When LLVM is used in other projects, it may happen that global cons-
tructors will execute before the call to ParseCommandLineOptions.
Since OptBisect is initialized via a constructor, and has no ability
to be updated at a later time, passing "-opt-bisect-limit" to the
parse function may have no effect.

To avoid this problem use a cl::cb (callback) to set the bisection
limit when the option is actually processed.

Differential Revision: https://reviews.llvm.org/D104551
2021-08-05 09:04:17 -05:00
Bardia Mahjour 0e08891ec1 [DA] control compile-time spent by MIV tests
Function exploreDirections() in DependenceAnalysis implements a recursive
algorithm for refining direction vectors. This algorithm has worst-case
complexity of O(3^(n+1)) where n is the number of common loop levels.
In this patch I'm adding a threshold to control the amount of time we
spend in doing MIV tests (which most of the time end up resulting in over
pessimistic direction vectors anyway).

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D107159
2021-08-05 09:50:11 -04:00
Sander de Smalen 8d08a84745 [LV] Remove a change that was added in D106164.
This change wasn't strictly necessary for D106164 and could be removed.
This patch addresses the post-commit comments from @fhahn on D106164, and
also changes sve-widen-gep.ll to use the same IR test as shown in
pointer-induction.ll.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D106878
2021-08-05 14:44:53 +01:00
Paul Robinson 75aa3d520d Add a DIExpression const-folder to prevent silly expressions.
It's entirely possible (because it actually happened) for a bool
variable to end up with a 256-bit DW_AT_const_value.  This came about
when a local bool variable was initialized from a bitfield in a
32-byte struct of bitfields, and after inlining and constant
propagation, the variable did have a constant value. The sequence of
optimizations had it carrying "i256" values around, but once the
constant made it into the llvm.dbg.value, no further IR changes could
affect it.

Technically the llvm.dbg.value did have a DIExpression to reduce it
back down to 8 bits, but the compiler is in no way ready to emit an
oversized constant *and* a DWARF expression to manipulate it.
Depending on the circumstances, we had either just the very fat bool
value, or an expression with no starting value.

The sequence of optimizations that led to this state did seem pretty
reasonable, so the solution I came up with was to invent a DWARF
constant expression folder.  Currently it only does convert ops, but
there's no reason it couldn't do other ops if that became useful.

This broke three tests that depended on having convert ops survive
into the DWARF, so I added an operator that would abort the folder to
each of those tests.

Differential Revision: https://reviews.llvm.org/D106915
2021-08-05 06:14:40 -07:00
Petar Avramovic 66de26b1f9 GlobalISel: Fix matchEqualDefs for instructions with multiple defs
Instructions that produceSameValue produce same values for operands with
same index. matchEqualDefs used to return true for any two values from
different instructions that produce same values. Fix this by checking if
values are defined by operands with the same index.

Differential Revision: https://reviews.llvm.org/D107362
2021-08-05 15:05:45 +02:00
Simon Pilgrim e78bf49a58 [X86] Rename Subtarget Tuning Feature Flag Prefix. NFC.
As suggested on D107370, this patch renames the tuning feature flags to start with 'Tuning' instead of 'Feature'.

Differential Revision: https://reviews.llvm.org/D107459
2021-08-05 13:09:23 +01:00
Dominik Montada cc947e29ea [GlobalISel] Combine shr(shl x, c1), c2 to G_SBFX/G_UBFX
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D107330
2021-08-05 13:52:10 +02:00
Fraser Cormack 0b8471e91b [SelectionDAG] Correctly determine the VECREDUCE_SEQ_FMUL action
The LegalizeAction for this node should follow the logic for
`VECREDUCE_SEQ_FADD` and be determined using the vector operand's type.

here isn't an in-tree target that makes use of this, but I think it's safe to
say this is how it should behave, should a target want to customize the action
for this node.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D107478
2021-08-05 09:42:33 +01:00
Jay Foad e790b2b744 [AMDGPU] Make more use of getHiHalf64 and split64BitValue. NFCI. 2021-08-05 09:36:13 +01:00
Igor Kudrin 2c14798ead [ARM][llvm-objdump] Annotate PC-relative memory operands of VLDR instructions
This extends D105979 and adds support for VLDR instructions.

Differential Revision: https://reviews.llvm.org/D105980
2021-08-05 14:11:11 +07:00
Igor Kudrin ddbe812bcc [ARM][llvm-objdump] Annotate PC-relative memory operands
This implements `MCInstrAnalysis::evaluateMemoryOperandAddress()` for
Arm so that the disassembler can print the target address of memory
operands that use PC+immediate addressing.

Differential Revision: https://reviews.llvm.org/D105979
2021-08-05 14:11:11 +07:00
eopXD 26aa1bbe97 [NFCI] [LoopIdiom] Let processLoopStridedStore take StoreSize as SCEV instead of unsigned
Letting it take SCEV allows further modification on the function to optimize
if the StoreSize / Stride is runtime determined.

This is a preceeding of D107353.
The big picture is to let LoopIdiom deal with runtime-determined sizes.

Reviewed By: Whitney, lebedev.ri

Differential Revision: https://reviews.llvm.org/D104595
2021-08-05 13:21:48 +08:00
Heejin Ahn aa0b0fbbe6 [WebAssembly] Use `SDValue::getConstantOperandVal` (NFC)
Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D107499
2021-08-04 21:15:23 -07:00
Nathan Lanza 5848166369 Disable LibFuncs for stpcpy and stpncpy for Android < 21
These functions don't exist in android API levels < 21. A change in
llvm-12 (rG6dbf0cfcf789) caused Oz builds to emit this symbol assuming
it's available and thus is causing link errors. Simply disable it here.

Differential Revision: https://reviews.llvm.org/D107509
2021-08-04 22:48:41 -04:00
Matt Jacobson 75abeb64ce [AVR] emit 'MCSA_Global' references to '__do_global_ctors' and '__do_global_dtors'
Emit references to '__do_global_ctors' and '__do_global_dtors' to allow
constructor/destructor routines to run.

Reviewed by: MaskRay

Differential Revision: https://reviews.llvm.org/D107133
2021-08-05 10:37:36 +08:00
modimo 041b525141 [CSSPGO] Remove used of PseudoProbeAttributes::Reserved
D106861 added usage of PseudoProbeAttributes::Reserved as TailCall however this usage hasn't been committed/reviewed. Removing this usage.

Testing
ninja check-all

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D107514
2021-08-04 17:23:56 -07:00
Yonghong Song e52946b9ab BPF: avoid NE/EQ loop exit condition
Kuniyuki Iwashima reported in [1] that llvm compiler may
convert a loop exit condition with "i < bound" to "i != bound", where
"i" is the loop index variable and "bound" is the upper bound.
In case that "bound" is not a constant, verifier will always have "i != bound"
true, which will cause verifier failure since to verifier this is
an infinite loop.

The fix is to avoid transforming "i < bound" to "i != bound".
In llvm, the transformation is done by IndVarSimplify pass.
The compiler checks loop condition cost (i = i + 1) and if the
cost is lower, it may transform "i < bound" to "i != bound".
This patch implemented getArithmeticInstrCost() in BPF TargetTransformInfo
class to return a higher cost for such an operation, which
will prevent the transformation for the test case
added in this patch.

 [1] https://lore.kernel.org/netdev/1994df05-8f01-371f-3c3b-d33d7836878c@fb.com/

Differential Revision: https://reviews.llvm.org/D107483
2021-08-04 16:54:16 -07:00
Jessica Paquette ca2e053652 [AArch64][GlobalISel] Legalize wide vector G_PHIs
Clamp the max number of elements when legalizing G_PHI. This allows us to
legalize some common fallbacks like 4 x s64.

Here's an example: https://godbolt.org/z/6YocsEYTd

Had to add -global-isel-abort=0 to legalize-phi.mir to account for the
G_EXTRACT_VECTOR_ELT from the 32 x s8 G_PHI.

Differential Revision: https://reviews.llvm.org/D107508
2021-08-04 16:48:59 -07:00
Heejin Ahn 31a71a393f [WebAssembly] Make result of 'catch' inst variadic
`catch` instruction can have any number of result values depending on
its tag, but so far we have only needed a single i32 return value for
C++ exception so the instruction was specified that way. But using the
instruction for SjLj handling requires multiple return values.

This makes `catch` instruction's results variadic and moves selection of
`throw` and `catch` instruction from ISelLowering to ISelDAGToDAG.
Moving `catch` to ISelDAGToDAG is necessary because I am not aware of
a good way to do instruction selection for variadic output instructions
in TableGen. This also moves `throw` because 1. `throw` and `catch`
share the same utility function and 2. there is really no reason we
should do that in ISelLowering in the first place. What we do is mostly
the same in both places, and moving them to ISelDAGToDAG allows us to
remove unnecessary mid-level nodes for `throw` and `catch` in
WebAssemblyISD.def and WebAssemblyInstrInfo.td.

This also adds handling for new `catch` instruction to AsmTypeCheck.

Reviewed By: dschuff, tlively

Differential Revision: https://reviews.llvm.org/D107423
2021-08-04 14:05:33 -07:00
Fangrui Song 9c19b36f1c [X86] Remove -x86-experimental-pref-loop-alignment in favor of -align-loops 2021-08-04 13:23:57 -07:00
Fangrui Song a194438615 [CodeGen] Add -align-loops
to `lib/CodeGen/CommandFlags.cpp`. It can replace
-x86-experimental-pref-loop-alignment=.

The loop alignment is only used by MachineBlockPlacement.
The implementation uses a new `llvm::TargetOptions` for now, as
an IR function attribute/module flags metadata may be overkill.

This is the llvm part of D106701.
2021-08-04 12:45:18 -07:00
Michael Liao 5edc886e90 [amdgpu] Add an enhanced conversion from i64 to f32.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D107187
2021-08-04 15:33:12 -04:00
Nikita Popov bb15861e14 [MemCpyOpt] Relax libcall checks
Rather than blocking the whole MemCpyOpt pass if the libcalls are
not available, only disable creation of new memset/memcpy intrinsics
where only load/stores were used previously. This only affects the
store merging and load-store conversion optimization. Other
optimizations are derived from existing intrinsics, which are
well-defined in the absence of libcalls -- not having the libcalls
just means that call simplification won't convert them to intrinsics.

This is a weaker variation of D104801, which dropped these checks
entirely. Ideally we would not couple emission of intrinsics to
libcall availability at all, but as the intrinsics may be legalized
to libcalls we need to be a bit careful right now.

Differential Revision: https://reviews.llvm.org/D106769
2021-08-04 21:17:51 +02:00
Giorgis Georgakoudis 29a3e3dd7b [OpenMPOpt] Expand SPMDization with guarding for target parallel regions
This patch expands SPMDization (converting generic execution mode to SPMD for target regions) by guarding code regions that should be executed only by the main thread. Specifically, it generates guarded regions, which only the main thread executes, and the synchronization with worker threads using simple barriers. For correctness, the patch aborts SPMDization for target regions if the same code executes in a parallel region, thus must be not be guarded. This check is implemented using the ParallelLevels AA.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D106892
2021-08-04 11:49:24 -07:00
Craig Topper c23405174a [DAGCombiner][AMDGPU] Canonicalize constants to the RHS of MULHU/MULHS.
This allows special constants like to 0 to be recognized. It's also
expected by isel patterns if a target had a mulh with immediate instructions.
The commuting done by tablegen won't commute patterns with immediates since it
expects DAGCombine to have done it.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D107486
2021-08-04 11:39:23 -07:00
Alexey Bataev 214f99b27c Revert "[SLP]Do not emit extra shuffle for insertelements vectorization."
This reverts commit 871ea69803 to fix the
problem if the first vector is not just undef.
2021-08-04 11:28:59 -07:00
Reshabh Sharma dce35ef104 Revert "[AMDGPU] Handle functions in llvm's global ctors and dtors list"
This reverts commit d42e70b3d3.
2021-08-04 23:33:31 +05:30
Dawid Jurczak 238139be09 [DSE][NFC] Clean up DeadStoreElimination from unused variables
Differential Revision: https://reviews.llvm.org/D106446
2021-08-04 19:44:40 +02:00
Craig Topper 643ce70a64 [RISCV] Remove the _COMMUTABLE and _TA versions of FMA and wide FMA vector instructions.
Use a tail policy operand instead. Inspired by the work in D105092,
but without the intrinsic interface changes.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D106512
2021-08-04 10:39:50 -07:00
Jessica Paquette d9279843b1 [AArch64][GlobalISel] Widen G_PHI before clamping it during legalization
This allows us to handle weird types like s88; we first widen to s128, then
clamp back down to s64.

https://godbolt.org/z/9xqbP46Mz

Also this makes it possible for GISel to legalize the case in pr48188.ll. It
now does the same thing as SDAG, although regalloc chooses different registers.

Differential Revision: https://reviews.llvm.org/D107417
2021-08-04 10:25:14 -07:00
Jessica Paquette 7d97de60b3 [AArch64][GlobalISel] Widen G_FPTO*I before clamping
Going through our legalization rules and doing some cleanup.

Widening and then clamping is usually easier than clamping and then widening.

This allows us to legalize some weird types like s88.

Differential Revision: https://reviews.llvm.org/D107413
2021-08-04 10:19:26 -07:00
Petr Hosek 6660cec568 [InstrProfiling] Emit bias variable eagerly
Rather than emitting the bias variable lazily as needed, emit it
eagerly. This allows profile runtime to refer to this variable
unconditionally without having to use the weak reference. The bias
variable is in a COMDAT so there'll never be more than one instance,
and if it's not needed, linker should be able to GC it, so the overhead
should be minimal.

Differential Revision: https://reviews.llvm.org/D107377
2021-08-04 10:17:08 -07:00
Andrea Di Biagio 7a1a35a1d1 [X86][SchedModel] Add missing ReadAdvance for some arithmetic ops (PR51318 and PR51322).
This fixes a bug where implicit uses of EFLAGS were not marked as ReadAdvance in
the RM/MR variants of ADC/SBB (PR51318)

This also fixes the absence of ReadAdvance for the register operand of
RMW arithmetic instructions (PR51322).

Differential Revision: https://reviews.llvm.org/D107367
2021-08-04 17:50:22 +01:00
jamesluox ee7d20e846 [CSSPGO] Migrate and refactor the decoder of Pseudo Probe
Migrate pseudo probe decoding logic in llvm-profgen to MC, so other LLVM-base program could reuse existing codes. Redesign object layout of encoded and decoded pseudo probes.

Reviewed By: hoy

Differential Revision: https://reviews.llvm.org/D106861
2021-08-04 09:21:34 -07:00
Sander de Smalen fe6ae81ef3 [InstCombine] Fix vscale zext/sext optimization when vscale_range is unbounded.
According to the LangRef, a (vscale_range) value of 0 means unbounded.

This patch additionally cleans up the test file vscale_sext_and_zext.ll.
2021-08-04 17:17:37 +01:00
Bradley Smith d9cc5d84e4 [AArch64][SVE] Combine bitcasts of predicate types with vector inserts/extracts of loads/stores
An insert subvector that is inserting the result of a vector predicate
sized load into undef at index 0, whose result is casted to a predicate
type, can be combined into a direct predicate load. Likewise the same
applies to extract subvector but in reverse.

The purpose of this optimization is to clean up cases that will be
introduced in a later patch where casts to/from predicate types from i8
types will use insert subvector, rather than going through memory early.

This optimization is done in SVEIntrinsicOpts rather than InstCombine to
re-introduce scalable loads as late as possible, to give other
optimizations the best chance possible to do a good job.

Differential Revision: https://reviews.llvm.org/D106549
2021-08-04 15:51:14 +00:00
Simon Wallis 9269752671 [AArch64] Fix assert AArch64TargetLowering::ReplaceNodeResults
Don't know how to custom expand this
UNREACHABLE executed at llvm-project/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16788

The fix is to provide missing expansions for:
  case ISD::STRICT_FP_TO_UINT:
  case ISD::STRICT_FP_TO_SINT:

A test case is provided.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D107452
2021-08-04 16:18:19 +01:00
Chris Jackson 21ee38e24f [DebugInfo][LSR] Avoid crashes on large integer inputs
SCEV-based salvaging in LSR translates SCEVs to DIExpressions. SCEVs may
contain very large integers but the translation does not support
integers greater than 64 bits. This patch adds checks to ensure
conversions of these large integers is not attempted. A regression test
is added to ensure no such translation is attempted.

Reviewed by: StephenTozer

PR: https://bugs.llvm.org/show_bug.cgi?id=51329

Differential Revision: https://reviews.llvm.org/D107438
2021-08-04 15:51:22 +01:00
Reshabh Sharma d42e70b3d3 [AMDGPU] Handle functions in llvm's global ctors and dtors list
This patch introduces a new code object metadata field, ".kind"
which is used to add support for init and fini kernels.

HSAStreamer will use function attributes, "device-init" and
"device-fini" to distinguish between init and fini kernels from
the regular kernels and will emit metadata with ".kind" set to
"init" and "fini" respectively.

To reduce the number of init and fini kernels, the ctors and
dtors present in the llvm's global.ctors and global.dtors lists
are called from a single init and fini kernel respectively.

Reviewed by: yaxunl

Differential Revision: https://reviews.llvm.org/D105682
2021-08-04 19:53:33 +05:30
Roman Lebedev 35c0848b57
[NFC][X86] combineX86ShuffleChain(): hoist Mask variable higher up
Having `NewMask` outside of an if and rebinding `BaseMask` `ArrayRef`
to it is confusing. Instead, just move the `Mask` vector higher up,
and change the code that earlier had no access to it but now does
to use `Mask` instead of `BaseMask`.

This has no other intentional changes.
2021-08-04 17:15:12 +03:00
Roman Lebedev 916cdc3d4b
[NFC][X86] combineX86ShuffleChain(): rename inner Mask to avoid future shadowing
I want to hoist `Mask` variable higher up,
but then it would clash with this one.
So let's rename this one first.

There are no other intentional changes here other than said rename.
2021-08-04 17:15:12 +03:00
Tomas Matheson 40650f27b5 [ARM][atomicrmw] Fix CMP_SWAP_32 expand assert
This assert is intended to ensure that the high registers are not
selected when it is passed to one of the thumb UXT instructions. However
it was triggering even for 32 bit where no UXT instruction is emitted.

Fixes PR51313.

Differential Revision: https://reviews.llvm.org/D107363
2021-08-04 15:02:02 +01:00
Roman Lebedev f819e4c7d0
[X86] combineX86ShuffleChain(): canonicalize mask elts picking from splats
Given a shuffle mask, if it is picking from an input that is splat
given the current granularity of the shuffle, then adjust the mask
to pick from the same lane of the input as the mask element is in.
This may result in a shuffle being simplified into a blend.

I believe this is correct given that the splat detection matches the one
just above the new code,

My basic thought is that we might be able to get less regressions
by handling multiple insertions of the same value into a vector
if we form broadcasts+blend here, as opposed to D105390,
but i have not really thought this through,
and did not try implementing it yet.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D107009
2021-08-04 16:55:04 +03:00
David Green eeddcba525 [RDA] Attempt to make RDA subreg aware
This attempts to make more of RDA aware of potentially overlapping
subregisters. Some of this was already in place, with it iterating
through MCRegUnitIterators. This also replaces calls to
LiveRegs.contains(..) with !LiveRegs.available(..), and updates the
isValidRegUseOf and isValidRegDefOf to search subregs.

Differential Revision: https://reviews.llvm.org/D107351
2021-08-04 14:21:32 +01:00
Simon Pilgrim 8cd40ece70 [X86] Rename X86 tuning feature flag FeatureHasFastGather -> FeatureFastGather
Match the naming style used by the other 'FeatureFast/FeatureSlow' tuning flags.
2021-08-04 13:07:50 +01:00
Simon Pilgrim 17e8ac0703 [X86] Move FeatureFastBEXTR from bdver2 features to tuning
Noticed while looking at the feature flag renaming suggested in D107370
2021-08-04 13:07:49 +01:00
Serge Pavlov 0c28a7c990 Revert "Introduce intrinsic llvm.isnan"
This reverts commit 16ff91ebcc.
Several errors were reported mainly test-suite execution time. Reverted
for investigation.
2021-08-04 17:18:15 +07:00
Simon Pilgrim fc8dee1ebb [X86] Split Subtarget ISA / Security / Tuning Feature Flags Definitions. NFC
Our list of slow/fast tuning feature flags has become pretty extensive and is randomly interleaved with ISA and Security (Retpoline etc.) flags, not even based on when the ISAs/flags were introduced, making it tricky to locate them. Plus we started treating tuning flags separately some time ago, so this patch tries to group the flags to match.

I've left them mostly in the same order within each group - I'm happy to rearrange them further if there are specific ISA or Tuning flags that you think should be kept closer together.

Differential Revision: https://reviews.llvm.org/D107370
2021-08-04 11:16:36 +01:00
Tim Northover d7b0e5525a X86: fix frame offset calculation with mandatory tail calls
If there's a region of the stack reserved for potential tail call arguments
(only the case when we guarantee tail calls will be honoured), this is right
next to the incoming stored return address, not necessarily next to the
callee-saved area, so combining the two into a single figure leads to incorrect
offsets in some edge cases.
2021-08-04 10:02:42 +01:00
Serge Pavlov 16ff91ebcc Introduce intrinsic llvm.isnan
Clang has builtin function '__builtin_isnan', which implements C
library function 'isnan'. This function now is implemented entirely in
clang codegen, which expands the function into set of IR operations.
There are three mechanisms by which the expansion can be made.

* The most common mechanism is using an unordered comparison made by
  instruction 'fcmp uno'. This simple solution is target-independent
  and works well in most cases. It however is not suitable if floating
  point exceptions are tracked. Corresponding IEEE 754 operation and C
  function must never raise FP exception, even if the argument is a
  signaling NaN. Compare instructions usually does not have such
  property, they raise 'invalid' exception in such case. So this
  mechanism is unsuitable when exception behavior is strict. In
  particular it could result in unexpected trapping if argument is SNaN.

* Another solution was implemented in https://reviews.llvm.org/D95948.
  It is used in the cases when raising FP exceptions by 'isnan' is not
  allowed. This solution implements 'isnan' using integer operations.
  It solves the problem of exceptions, but offers one solution for all
  targets, however some can do the check in more efficient way.

* Solution implemented by https://reviews.llvm.org/D96568 introduced a
  hook 'clang::TargetCodeGenInfo::testFPKind', which injects target
  specific code into IR. Now only SystemZ implements this hook and it
  generates a call to target specific intrinsic function.

Although these mechanisms allow to implement 'isnan' with enough
efficiency, expanding 'isnan' in clang has drawbacks:

* The operation 'isnan' is hidden behind generic integer operations or
  target-specific intrinsics. It complicates analysis and can prevent
  some optimizations.

* IR can be created by tools other than clang, in this case treatment
  of 'isnan' has to be duplicated in that tool.

Another issue with the current implementation of 'isnan' comes from the
use of options '-ffast-math' or '-fno-honor-nans'. If such option is
specified, 'fcmp uno' may be optimized to 'false'. It is valid
optimization in general, but it results in 'isnan' always returning
'false'. For example, in some libc++ implementations the following code
returns 'false':

    std::isnan(std::numeric_limits<float>::quiet_NaN())

The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
operands are never NaNs. This assumption however should not be applied
to the functions that check FP number properties, including 'isnan'. If
such function returns expected result instead of actually making
checks, it becomes useless in many cases. The option '-ffast-math' is
often used for performance critical code, as it can speed up execution
by the expense of manual treatment of corner cases. If 'isnan' returns
assumed result, a user cannot use it in the manual treatment of NaNs
and has to invent replacements, like making the check using integer
operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
which also expresses the opinion, that limitations imposed by
'-ffast-math' should be applied only to 'math' functions but not to
'tests'.

To overcome these drawbacks, this change introduces a new IR intrinsic
function 'llvm.isnan', which realizes the check as specified by IEEE-754
and C standards in target-agnostic way. During IR transformations it
does not undergo undesirable optimizations. It reaches instruction
selection, where is lowered in target-dependent way. The lowering can
vary depending on options like '-ffast-math' or '-ffp-model' so the
resulting code satisfies requested semantics.

Differential Revision: https://reviews.llvm.org/D104854
2021-08-04 15:27:49 +07:00
Sjoerd Meijer 30fbb06979 [FuncSpec] Support specialising recursive functions
This adds support for specialising recursive functions. For example:

    int Global = 1;
    void recursiveFunc(int *arg) {
      if (*arg < 4) {
        print(*arg);
        recursiveFunc(*arg + 1);
      }
    }
    void main() {
      recursiveFunc(&Global);
    }

After 3 iterations of function specialisation, followed by inlining of the
specialised versions of recursiveFunc, the main function looks like this:

    void main() {
      print(1);
      print(2);
      print(3);
    }

To support this, the following has been added:
- Update the solver and state of the new specialised functions,
- An optimisation to propagate constant stack values after each iteration of
  function specialisation, which is necessary for the next iteration to
  recognise the constant values and trigger.

Specialising recursive functions is (at the moment) controlled by option
-func-specialization-max-iters and is opt-in for compile-time reasons. I.e.,
the default is -func-specialization-max-iters=1, but for the example above we
would need to use -func-specialization-max-iters=3. Future work is to see if we
can increase the default, or improve the cost-model/heuristics to control
compile-times.

Differential Revision: https://reviews.llvm.org/D106426
2021-08-04 08:07:04 +01:00
Senran Zhang 486b6013f9 [Support] Initialize common options in `getRegisteredOptions`
This allows users accessing options in libSupport before invoking
`cl::ParseCommandLineOptions`, and also matches the behavior before
D105959.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D106334
2021-08-03 23:59:10 -07:00
hsmahesha 596e61c332 [AMDGPU] Ignore call graph node which does not have function info.
While collecting reachable callees (from kernels), ignore call graph node which
does not have associated function or associated function is not a definition.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D107329
2021-08-04 10:22:33 +05:30
Senran Zhang df4e0beaeb [NFC][ConstantFold] Check getAggregateElement before getSplatValue call
Constant::getSplatValue has O(N) time complexity in the worst case,
where N is the # of elements in a vector. So we call
Constant::getAggregateElement first and return earlier if possible to
avoid unnecessary getSplatValue calls.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D107252
2021-08-03 21:52:14 -07:00
Heejin Ahn 9bd02c433b [WebAssembly] Misc. cosmetic changes in EH (NFC)
- Rename `wasm.catch` intrinsic to `wasm.catch.exn`, because we are
  planning to add a separate `wasm.catch.longjmp` intrinsic which
  returns two values.
- Rename several variables
- Remove an unnecessary parameter from `canLongjmp` and `isEmAsmCall`
  from LowerEmscriptenEHSjLj pass
- Add `-verify-machineinstrs` in a test for a safety measure
- Add more comments + fix some errors in comments
- Replace `std::vector` with `SmallVector` for cases likely with small
  number of elements
- Renamed `EnableEH`/`EnableSjLj` to `EnableEmEH`/`EnableEmSjLj`: We are
  soon going to add `EnableWasmSjLj`, so this makes the distincion
  clearer

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D107405
2021-08-03 21:03:46 -07:00
Arthur Eubanks ad25344620 [MC][CodeGen] Emit constant pools earlier
Previously we would emit constant pool entries for ldr inline asm at the
very end of AsmPrinter::doFinalization(). However, if we're emitting
dwarf aranges, that would end all sections with aranges. Then if we have
constant pool entries to be emitted in those same sections, we'd hit an
assert that the section has already been ended.

We want to emit constant pool entries before emitting dwarf aranges.
This patch splits out arm32/64's constant pool entry emission into its
own MCTargetStreamer virtual method.

Fixes PR51208

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D107314
2021-08-03 20:55:31 -07:00
Jacob Hegna b16c37fa2c [MLGO] Update the current model url for the Oz inliner model. 2021-08-04 03:09:00 +00:00
Jessica Paquette 5643736378 [AArch64][GlobalISel] Widen G_SELECT before clamping it
This allows us to handle the s88 G_SELECTS:

https://godbolt.org/z/5s18M4erY

Weird types like this can result in weird merges.

Widening to s128 first and then clamping down avoids that situation.

Differential Revision: https://reviews.llvm.org/D107415
2021-08-03 18:31:17 -07:00
Shimin Cui 2d9759c790 [GlobalOpt] Fix the load types when OptimizeGlobalAddressOfMalloc
Currently, in OptimizeGlobalAddressOfMalloc, the transformation for global loads assumes that they have the same Type. With the support of ConstantExpr (https://reviews.llvm.org/D106589), this may not be true any more (as seen in the test case), and we miss the code to handle this, This is to fix that.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D107397
2021-08-03 19:22:53 -04:00
Craig Topper b818da27ab [SimplifyCFG] Enable switch to lookup table for more types.
This transform has been restricted to legal types since
https://reviews.llvm.org/rG65df808f6254617b9eee931d00e95d900610b660
in 2012.

This is particularly restrictive on RISCV64 which only has i64
as a legal integer type. i32 is a very common type in code
generated from C, but we won't form a lookup table with it.
This also effects other common types like i8/i16 types on ARM,
AArch64, RISCV, etc.

This patch proposes to allow power of 2 types larger than 8 bit, if
they will fit in the largest legal integer type in DataLayout.
These types are common in C code so generally well handled in
the backends.

We could probably do this for other types like i24 and rely on
alignment and padding to allow the backend to use a single wider
load. This isn't my main concern right now and it will need more
tests.

We could also allow larger types up to some limit and let the
backend split into multiple loads, but we need to define that
limit. It's also not my main concern right now.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D107233
2021-08-03 15:35:16 -07:00
Evandro Menezes 63a5ac4e0d [RISCV] Add scheduling resources for V
Add the scheduling resources for the V extension instructions.

Differential Revision: https://reviews.llvm.org/D98002
2021-08-03 15:47:51 -05:00
modimo f5b8a3125a [ThinLTO] Add TimeTrace for Thinlink step
Results from Clang self-build:

{F17435948}

Testing:
ninja check-all

Reviewed By: anton-afanasyev

Differential Revision: https://reviews.llvm.org/D104428
2021-08-03 13:20:04 -07:00
Alexey Bataev 871ea69803 [SLP]Do not emit extra shuffle for insertelements vectorization.
If the vectorized insertelements instructions form indentity subvector
(the subvector at the beginning of the long vector), it is just enough
to extend the vector itself, no need to generate inserting subvector
shuffle.

Differential Revision: https://reviews.llvm.org/D107344
2021-08-03 13:18:41 -07:00
Alexey Bataev 7d9d926a18 Revert "[SLP]Improve graph reordering."
This reverts commit e408d1dfab and
2 other (4b25c11321 and
c2deb2afaf) related to fix the problem with the
reordering shuffles.
2021-08-03 12:13:43 -07:00
Sami Tolvanen 7ce1c4da77 ThinLTO: Fix inline assembly references to static functions with CFI
Create an internal alias with the original name for static functions
that are renamed in promoteInternals to avoid breaking inline
assembly references to them.

Relands 700d07f8ce with -msvc targets
fixed.

Link: https://github.com/ClangBuiltLinux/linux/issues/1354

Reviewed By: nickdesaulniers, pcc

Differential Revision: https://reviews.llvm.org/D104058
2021-08-03 11:35:30 -07:00
Dylan Fleming 3943a74666 [InstCombine] Fixed select + masked load fold failure
Fixed type assertion failure caused by trying to fold a masked load with a
select where the select condition is a scalar value

Reviewed By: sdesmalen, lebedev.ri

Differential Revision: https://reviews.llvm.org/D107372
2021-08-03 19:06:12 +01:00
Philip Reames 223835f08b [runtimeunroll] A bit of style cleanup to simplify a following change [NFC]
Use for-range, use the idiomatic pattern for non-loop values, etc..
2021-08-03 10:28:46 -07:00
David Green bd07c2e266 [AArch64] Prefer fmov over orr v.16b when copying f32/f64
This changes the lowering of f32 and f64 COPY from a 128bit vector ORR to
a fmov of the appropriate type. At least on some CPU's with 64bit NEON
data paths this is expected to be faster, and shouldn't be slower on any
CPU that treats fmov as a register rename.

Differential Revision: https://reviews.llvm.org/D106365
2021-08-03 17:25:40 +01:00
Craig Topper deaeb16d88 [RISCV] Indicate that RISCVMergeBaseOffsetOpt preserves the CFG.
Return false from runOnFunction if nothing changed. Curiously
we already returned a bool from detectAndFoldOffset, but didn't
use it.

Fix a couple breaks after returns that I saw while auditing
detectAndFoldOffset.

Differential Revision: https://reviews.llvm.org/D107303
2021-08-03 08:32:36 -07:00
Simon Pilgrim 11396641e4 [DAG] Cleanup DAGCombiner::CombineConsecutiveLoads early-outs. NFCI.
We had some similar hasOneUse/isNON_EXTLoad early-outs spread out over different parts of the method - we should pull them all together.

Noticed while triaging PR45116
2021-08-03 13:47:55 +01:00
Krishna 946fd4ea65 Revert "[InstCombine] Remove nnan requirement for transformation to fabs from select"
This reverts commit 6180ce2e2a.
2021-08-03 18:08:11 +05:30
Krishna d99260641b [InstCombine] Fold phi ( inttoptr/ptrtoint x ) to phi (x)
The inttoptr/ptrtoint roundtrip optimization is not always correct.
We are working towards removing this optimization and adding support to specific cases where this optimization works.

In this patch, we focus on phi-node operands with inttoptr casts.
We know that ptrtoint( inttoptr( ptrtoint x) ) is same as ptrtoint (x).
So, we want to remove this roundtrip cast which goes through phi-node.

Reviewed By: aqjune

Differential Revision: https://reviews.llvm.org/D106289
2021-08-03 17:52:59 +05:30
Krishna 6180ce2e2a [InstCombine] Remove nnan requirement for transformation to fabs from select
In this patch, the "nnan" requirement is removed for the canonicalization of select with fcmp to fabs.
(i) FSub logic: Remove check for nnan flag presence in fsub. Example: https://alive2.llvm.org/ce/z/751svg (fsub).
(ii) FNeg logic: Remove check for the presence of nnan and nsz flag in fneg. Example: https://alive2.llvm.org/ce/z/a_fsdp (fneg).

Differential Revision: https://reviews.llvm.org/D106872
2021-08-03 17:52:58 +05:30
Simon Pilgrim d3917bbfc6 [X86] Add title comment to separate the "CPU Families" features from the other subtarget features. NFCI.
Hopefully we can get rid of these some day...
2021-08-03 12:53:57 +01:00
Fraser Cormack cba6aab971 [RISCV] Support simple fractional steps in matching VID sequences
This patch extends the optimization of VID-sequence BUILD_VECTORs
introduced in D104921 to include simple fractional steps composed of a
separated integer numerator and denominator.

A notable limitation in this sequence detection is that only sequences
with steps N/1 or 1/D are found, meaning that the step between elements
and the frequency with which it changes is consistent across the whole
sequence. Fractional steps such as 2/3 won't be matched as those would
involve more complex tracking of state or some level of backtracking.

As is stands, however, this patch is sufficient to match common
interleave-type shuffle indices, for example matching `<0,0,1,1>` (or
commonly `<0,u,1,u>` or `<u,0,u,1>`) to an index sequence divided by 2.

While the optimization is relatively `undef`-tolerant, due to greedy
pattern-matching there even are some simple patterns which confuse the
sequence detection into identifying either a suboptimal sequence or no
sequence at all.

Currently only fractional-step sequences identified as having a
power-of-two denominator are actually lowered to RVV instructions. This
is to avoid introducing divisions into the generated code.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D106533
2021-08-03 10:38:24 +01:00
Jason Molenda 0d8cd4e2d5 [AArch64InstPrinter] Change printAddSubImm to comment imm value when shifted
Add a comment when there is a shifted value,
    add x9, x0, #291, lsl #12 ; =1191936
but not when the immediate value is unshifted,
    subs x9, x0, #256 ; =256
when the comment adds nothing additional to the reader.

Differential Revision: https://reviews.llvm.org/D107196
2021-08-03 02:28:46 -07:00
Esme-Yi 69396896fb [llvm-readobj][XCOFF] Fix the error dumping for the first
item of StringTable.

Summary: For the string table in XCOFF, the first 4 bytes
contains the length of the string table, so we should
print the string entries from fifth bytes. This patch
also adds tests for llvm-readobj dumping the string
table.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D105522
2021-08-03 09:08:58 +00:00
David Sherwood 0156f91f3b [NFC] Rename enable-strict-reductions to force-ordered-reductions
I'm renaming the flag because a future patch will add a new
enableOrderedReductions() TTI interface and so the meaning of this
flag will change to be one of forcing the target to enable/disable
them. Also, since other places in LoopVectorize.cpp use the word
'Ordered' instead of 'strict' I changed the flag to match.

Differential Revision: https://reviews.llvm.org/D107264
2021-08-03 09:33:01 +01:00
Cullen Rhodes a02bbeeae7 [AArch64][AsmParser] NFC: Use helpers in matrix tile list parsing 2021-08-03 08:13:01 +00:00
Jay Foad 40202b13b2 [AMDGPU] Legalize operands of V_ADDC_U32_e32 and friends
These instructions have an implicit use of vcc which counts towards the
constant bus limit. Pre gfx10 this means that the explicit operands
cannot be sgprs. Use the custom inserter hook to call legalizeOperands
to enforce that restriction.

Fixes https://bugs.llvm.org/show_bug.cgi?id=51217

Differential Revision: https://reviews.llvm.org/D106868
2021-08-03 09:04:52 +01:00
Paulo Matos d3a0a65bf0 Reland: "[WebAssembly] Add new pass to lower int/ptr conversions of reftypes"
Add new pass LowerRefTypesIntPtrConv to generate debugtrap
instruction for an inttoptr and ptrtoint of a reference type instead
of erroring, since calling these instructions on non-integral pointers
has been since allowed (see ac81cb7e6).

Differential Revision: https://reviews.llvm.org/D107102
2021-08-03 09:20:51 +02:00
jacquesguan 7900ee0b61 [RISCV] Teach VSETVLI insertion to merge the unused VSETVLI with the one need to be insert after it.
If a vsetvli instruction is not compatible with the next vector instruction,
and there is no other things that may update or use VL/VTYPE, we could merge
it with the next vsetvli instruction that should be insert for the vector
instruction.

This commit only merge VTYPE with the former vsetvli instruction which has
the same VL.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D106857
2021-08-03 12:06:59 +08:00
luxufan 0023caf952 [RuntimeDyldChecker] Delete comparision of integers of different signs 2021-08-03 11:38:25 +08:00
luxufan f4e418ac1e [RuntimeDyldChecker] Support offset in decode_operand expr
In RISCV's relocations, some relocations are comprised of two relocation types. For example, R_RISCV_PCREL_HI20 and R_RISCV_PCREL_LO12_I compose a PC relative relocation. In general the compiler will set a label in the position of R_RISCV_PCREL_HI20. So, to test the R_RISCV_PCREL_LO12_I relocation, we need decode instruction at position of the label points to R_RISCV_PCREL_HI20 plus 4 (the size of a riscv non-compress instruction).

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D105528
2021-08-03 11:25:51 +08:00
Shimin Cui 7ce98cf56e [GlobalOpt] Fix the assert for stored once non-pointer to global address
This is to fix the assert @bjope reported due to the code change of https://reviews.llvm.org/D106589. The test case from @bjope is also included.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D107302
2021-08-02 19:23:29 -04:00
Eli Friedman 1f62af6346 [AArch64][SelectionDAG] Support passing/returning scalable vectors with unusual types.
This adds handling for two cases:

1. A scalable vector where the element type is promoted.
2. A scalable vector where the element count is odd (or more generally,
   not divisble by the element count of the part type).

(Some element types still don't work; for example, <vscale x 2 x i128>,
or <vscale x 2 x fp128>.)

Differential Revision: https://reviews.llvm.org/D105591
2021-08-02 15:53:16 -07:00
Roman Lebedev 6f6e9a867f
[BasicTTIImpl][LoopUnroll] getUnrollingPreferences(): emit ORE remark when advising against unrolling due to a call in a loop
I'm not sure this is the best way to approach this,
but the situation is rather not very detectable unless we explicitly call it out when refusing to advise to unroll.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D107271
2021-08-03 00:57:26 +03:00
Roman Lebedev 4ba3326f17
[InstCombine] `vector_reduce_{or,and}(?ext(<n x i1>))` --> `?ext(vector_reduce_{or,and}(<n x i1>))` (PR51259)
This allows the expansion logic to actually trigger if the argument
was extended from i1 element type, like the rest of the reductions expect.

Alive2 agrees:
https://alive2.llvm.org/ce/z/wcfews (or zext)
https://alive2.llvm.org/ce/z/FCXNFx (or sext)
https://alive2.llvm.org/ce/z/f26zUY (and zext)
https://alive2.llvm.org/ce/z/jprViN (and sext)
2021-08-03 00:54:35 +03:00
Jessica Paquette bd13c8e610 [AArch64][GlobalISel] Emit extloads for ZExt/SExt values in assignValueToAddress
When a value is expected to be extended, we should emit an extended load rather
than a normal G_LOAD.

Add checklines to arm64-abi.ll which show that we now emit the correct loads.

For ease of comparison: https://godbolt.org/z/8WvY6EfdE

Differential Revision: https://reviews.llvm.org/D107313
2021-08-02 14:48:44 -07:00
Roman Lebedev 554fc9ad0a
[InstCombine] `vector_reduce_smax(?ext(<n x i1>))` --> `?ext(vector_reduce_{and,or}(<n x i1>))` (PR51259)
Alive2 agrees:
https://alive2.llvm.org/ce/z/3oqir9 (self)
https://alive2.llvm.org/ce/z/6cuI5m (zext)
https://alive2.llvm.org/ce/z/4FL8rD (sext)

We already handle `vector_reduce_and(<n x i1>)`,
so let's just combine into the already-handled pattern
and let the existing fold do the rest.
2021-08-03 00:29:06 +03:00
Roman Lebedev f47b7b6d10
[InstCombine] `vector_reduce_smin(?ext(<n x i1>))` --> `?ext(vector_reduce_{or,and}(<n x i1>))` (PR51259)
Alive2 agrees:
https://alive2.llvm.org/ce/z/noXtZ8 (self)
https://alive2.llvm.org/ce/z/JNrN6C (zext)
https://alive2.llvm.org/ce/z/58snuN (sext)

We already handle `vector_reduce_and(<n x i1>)`,
so let's just combine into the already-handled pattern
and let the existing fold do the rest.
2021-08-03 00:29:06 +03:00
Nikita Popov c7770574f9 Revert "[unroll] Move multiple exit costing into consumer pass [NFC]"
This reverts commit 76940577e4.

This causes Transforms/LoopUnroll/ARM/multi-blocks.ll to fail.
2021-08-02 22:23:34 +02:00
Chang-Sun Lin, Jr b58eda39eb [ValueTracking] Fix computeConstantRange to use "may" instead of "always" semantics for llvm.assume
ValueTracking should allow for value ranges that may satisfy
llvm.assume, instead of restricting the ranges only to values that
will always satisfy the condition.

Differential Revision: https://reviews.llvm.org/D107298
2021-08-02 22:20:17 +02:00
Roman Lebedev b9b7162b8b
[InstCombine] `vector_reduce_umax(?ext(<n x i1>))` --> `?ext(vector_reduce_or(<n x i1>))` (PR51259)
Alive2 agrees:
https://alive2.llvm.org/ce/z/NbBaeT (self)
https://alive2.llvm.org/ce/z/iEaig4 (zext)
https://alive2.llvm.org/ce/z/meGb3y (sext)

We already handle `vector_reduce_and(<n x i1>)`,
so let's just combine into the already-handled pattern
and let the existing fold do the rest.
2021-08-02 23:02:23 +03:00
Roman Lebedev 0c13798056
[InstCombine] `vector_reduce_umin(?ext(<n x i1>))` --> `?ext(vector_reduce_and(<n x i1>))` (PR51259)
Alive2 agrees:
https://alive2.llvm.org/ce/z/XxUScW (self)
https://alive2.llvm.org/ce/z/3usTF- (zext)
https://alive2.llvm.org/ce/z/GVxwQz (sext)

We already handle `vector_reduce_and(<n x i1>)`,
so let's just combine into the already-handled pattern
and let the existing fold do the rest.
2021-08-02 23:02:22 +03:00
Philip Reames 76940577e4 [unroll] Move multiple exit costing into consumer pass [NFC]
This aligns the multiple exit costing with all the other cost decisions.  Note that UnrollAndJam, which is the only other caller of the original home of this code, unconditionally bails out of multiple exit loops.
2021-08-02 12:46:23 -07:00
Nikita Popov 380b8a603c [DFAJumpThreading] Use SmallPtrSet for Visited (NFC)
This set is only used for contains checks, so there is no need to
use std::set.
2021-08-02 21:30:25 +02:00
Nikita Popov 3f7aea1a37 [DFAJumpThreading] Use insert return value (NFC)
Rather than find + insert. Also use range based for loop.
2021-08-02 21:21:21 +02:00
Nikita Popov 84602f98c6 [DFAJumpThreading] Remove unnecessary includes (NFC)
This file uses neither unordered_map nor unordered_set.
2021-08-02 21:13:30 +02:00
Nikita Popov e97524cba2 [DFAJumpThreading] Mark DT as preserved in LegacyPM
It is marked as preserved in NewPM, but not LegacyPM.
2021-08-02 21:13:30 +02:00
Roman Lebedev 469793efa7
[InstCombine] `vector_reduce_mul(?ext(<n x i1>))` --> `zext(vector_reduce_and(<n x i1>))` (PR51259)
Alive2 agrees:
https://alive2.llvm.org/ce/z/PDansB (self)
https://alive2.llvm.org/ce/z/55D-Xc (zext)
https://alive2.llvm.org/ce/z/LxG3-r (sext)

We already handle `vector_reduce_and(<n x i1>)`,
so let's just combine into the already-handled pattern
and let the existing fold do the rest.
2021-08-02 21:57:51 +03:00
Philip Reames 9016beaa24 [unrollruntime] Pull out a helper function for readability and eventual reuse [nfc] 2021-08-02 11:47:27 -07:00
Paulo Matos 245f2ee647 Revert "[WebAssembly] Add new pass to lower int/ptr conversions of reftypes"
This reverts commit ce1c59dea6.
2021-08-02 20:12:25 +02:00
Philip Reames ebc4c4e3b0 [unroll] Add clarifying comment
The option to not preserve LCSSA is in fact not tested at all in upstream.  I was tempted to just remove the code entirely, but realized I didn't need to for my actual goal.
2021-08-02 10:44:56 -07:00
Alexander Yermolovich 5a865b0b1e [DWARF] Don't process .debug_info relocations for DWO Context
When we build with split dwarf in single mode the .o files that contain both "normal" debug sections and dwo sections, along with relocaiton sections for "normal" debug sections.
When we create DWARF context in DWARFObjInMemory we process relocations and store them in the map for .debug_info, etc section.
For DWO Context we also do it for non dwo dwarf sections. Which I believe is not necessary. This leads to a lot of memory being wasted. We observed 70GB extra memory being used.

I went with context sensitive approach, flag is passed in. I am not sure if it's always safe not to process relocations for regular debug sections if Obj contains .dwo sections.
If it is alternatvie might be just to scan, in constructor, sections and if there are .dwo sections not to process regular debug ones.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D106624
2021-08-02 10:41:47 -07:00
Paulo Matos ce1c59dea6 [WebAssembly] Add new pass to lower int/ptr conversions of reftypes
Add new pass LowerRefTypesIntPtrConv to generate trap
instruction for an inttoptr and ptrtoint of a reference type instead
of erroring, since calling these instructions on non-integral pointers
has been since allowed (see ac81cb7e6).

Differential Revision: https://reviews.llvm.org/D107102
2021-08-02 19:40:00 +02:00
Florian Hahn bb725c9803
[VPlan] Use defined and ops VPValues to print VPInterleaveRecipe.
This patch updates VPInterleaveRecipe::print to print the actual defined
VPValues for load groups and the store VPValue operands for store
groups.

The IR references may become outdated while transforming the VPlan and
the defined and stored VPValues always are up-to-date.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D107223
2021-08-02 18:36:36 +01:00
Roman Lebedev 1e801439be
[InstCombine] `xor` reduction w/ i1 elt type is a parity check
For i1 element type, `xor` and `add` are interchangeable
(https://alive2.llvm.org/ce/z/e77hhQ), so we should treat it just like
an `add` reduction and consistently transform them both:
https://alive2.llvm.org/ce/z/MjCm5W (self)
https://alive2.llvm.org/ce/z/kgqF4M (skipped zext)
https://alive2.llvm.org/ce/z/pgy3HP (skipped sext)

Though, let's emit the IR that is similar to the one we produce for
`vector_reduce_add(<n x i1>)`.

See https://bugs.llvm.org/show_bug.cgi?id=51259
2021-08-02 20:21:37 +03:00
Thomas Lively 417e500668 [WebAssembly] Compute known bits for SIMD bitmask intrinsics
This optimizes out the mask when the result of a bitmask is interpreted as an i8
or i16 value. Resolves PR50507.

Differential Revision: https://reviews.llvm.org/D107103
2021-08-02 09:52:34 -07:00
David Green c423a586a7 [ARM] Remove setPreservesCFG from ARMBlockPlacement
As of 2829391840 it no longer preserves the CFG, needing to
split blocks in order to add DLS instructions.
2021-08-02 14:15:45 +01:00
Irina Dobrescu b01417d3c5 [AArch64] Optimise min/max lowering in ISel
Differential Revision: https://reviews.llvm.org/D106561
2021-08-02 13:40:21 +01:00
Carl Ritson 675c942373 [AMDGPU] Disable NSA for BVH instructions when appropriate
Check maximum NSA size when selecting NSA or non-NSA BVH instructions.

Differential Revision: https://reviews.llvm.org/D103230
2021-08-02 20:09:26 +09:00
Florian Mayer 66b4aafa2e [hwasan] Detect use after scope within function.
Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D105201
2021-08-02 11:34:12 +01:00
Simon Pilgrim 7397dcb403 [TTI] Add basic SK_InsertSubvector shuffle mask recognition
This patch adds an initial ShuffleVectorInst::isInsertSubvectorMask helper to recognize 2-op shuffles where the lowest elements of one of the sources are being inserted into the "in-place" other operand, this includes "concat_vectors" patterns as can be seen in the Arm shuffle cost changes. This also helped fix a x86 issue with irregular/length-changing SK_InsertSubvector costs - I'm hoping this will help with D107188

This doesn't currently attempt to work with 1-op shuffles that could either be a "widening" shuffle or a self-insertion.

The self-insertion case is tricky, but we currently always match this with the existing SK_PermuteSingleSrc logic.

The widening case will be addressed in a follow up patch that treats the cost as 0.

Masks with a high number of undef elts will still struggle to match optimal subvector widths - its currently bounded by minimum-width possible insertion, whilst some cases would benefit from wider (pow2?) subvectors.

Differential Revision: https://reviews.llvm.org/D107228
2021-08-02 11:23:44 +01:00
Simon Pilgrim 0579050116 Fix MSVC signed/unsigned comparison warning. NFCI. 2021-08-02 11:23:43 +01:00
Rosie Sumpter f117ed542f [LoopFlatten] Fix missed LoopFlatten opportunity
When the limit of the inner loop is a known integer, the InstCombine
pass now causes the transformation e.g. imcp ult i32 %inc, tripcount ->
icmp ult %j, tripcount-step (where %j is the inner loop induction
variable and %inc is add %j, step), which is now accounted for when
identifying the trip count of the loop. This is also an acceptable use
of %j (provided the step is 1) so is ignored as long as the compare
that it's used in is also the condition of the inner branch.

Differential Revision: https://reviews.llvm.org/D105802
2021-08-02 11:09:54 +01:00
David Green 2829391840 [ARM] Revert WLSTP to DLSTP if the target block is out of range
If the block target for a WLSTP instruction is known to be out of range,
and cannot be fixed by the ARMBlockPlacementPass, we can relax it to a
DLSTP (and cmp/branch) to still allow the creation of tail predicated
loops. That is what this patch does, adding extra revert code to the
fallback path of ARMBlockPlacementPass.

Due to the code produced when reverting, this creates a DLSTP between a
Bcc and a Br. As a DLS isn't necessarily a terminator we need to split
the block to move the DLS/Br into.

Differential Revision: https://reviews.llvm.org/D104709
2021-08-02 10:59:52 +01:00
Cullen Rhodes 7ed0120d84 [AArch64][AsmParser] NFC: Parser.Lex() -> Lex()
Reviewed By: tmatheson

Differential Revision: https://reviews.llvm.org/D107146
2021-08-02 09:48:41 +00:00
Max Kazantsev c5b63714b5 [GC][NFC] Make getGCStrategy by name available in IR
We might want to use info from GC strategy in middle end analysis.
The motivation for this is provided in D99135: we may want to ask
a GC if it's going to work with a given pointer (currently this code
makes naive check by the method name).

Differetial Revision: https://reviews.llvm.org/D100559
Reviewed By: reames
2021-08-02 14:26:04 +07:00
Carl Ritson a441de6d94 [AMDGPU][GlobalISel] Add missing default mapping for BVH intrinsics
Application of default mapping to BVH intrinsics was missing.
Copy parts of SelectionDAG test to GlobalISel test as these would
have indicated this error.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D107211
2021-08-02 12:43:38 +09:00
Freddy Ye d268c20070 [X86] Support auto-detect for tigerlake and alderlake
Differential Revision: https://reviews.llvm.org/D107245
2021-08-02 11:01:01 +08:00
Shimin Cui 732b05555c [GlobalOpt] support ConstantExpr use of global address for OptimizeGlobalAddressOfMalloc
I'm working on extending the OptimizeGlobalAddressOfMalloc to handle some more general cases. This is to add support of the ConstantExpr use of the global variables. The function allUsesOfLoadedValueWillTrapIfNull is now iterative with the added CE use of GV. Also, the recursive function valueIsOnlyUsedLocallyOrStoredToOneGlobal is changed to iterative using a worklist with the GEP case added.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D106589
2021-07-31 18:42:02 -04:00
Hsiangkai Wang 8b33839f01 [RISCV] Rename vector inline constraint from 'v' to 'vr' and 'vm' in IR.
Differential Revision: https://reviews.llvm.org/D107139
2021-08-01 05:58:17 +08:00
Eli Friedman bdd55b2f18 Fix the default alignment of i1 vectors.
Currently, the default alignment is much larger than the actual size of
the vector in memory.  Fix this to use a sane default.

For SVE, temporarily remove lowering of load/store operations for
predicates with less than 16 elements. The layout the backend was
assuming for SVE predicates with less than 16 elements doesn't agree
with the frontend. More work probably needs to be done here.

This change is, strictly speaking, not backwards-compatible at the
bitcode level. But probably nobody is actually depending on that; i1
vectors in memory are rare, and the code that does use them probably
ends up forcing the alignment to something sane anyway.  If we think
this is a concern, I can restrict this to scalable vectors for now
(where it's actually causing issues for me at the moment).

Differential Revision: https://reviews.llvm.org/D88994
2021-07-31 14:09:59 -07:00
Eli Friedman 2a2847823f [ConstantFold] Get rid of special cases for sizeof etc.
Target-dependent constant folding will fold these down to simple
constants (or at least, expressions that don't involve a GEP).  We don't
need heroics to try to optimize the form of the expression before that
happens.

Fixes https://bugs.llvm.org/show_bug.cgi?id=51232 .

Differential Revision: https://reviews.llvm.org/D107116
2021-07-31 13:20:47 -07:00
Sanjay Patel 7f55557765 [Analysis] improve function signature checking for snprintf
The check for size_t parameter 1 was already here for snprintf_chk,
but it wasn't applied to regular snprintf. This could lead to
mismatching and eventually crashing as shown in:
https://llvm.org/PR50885
2021-07-31 15:17:20 -04:00
Craig Topper 593059b328 [RISCV] Rename RISCVISD::FCVT_W_RV64 to FCVT_W_RTZ_RV64. NFC
fcvt.w(u) supports multiple rounding modes, but the ISD node
doesn't encode that. So name it to match the rounding mode it uses.
2021-07-31 11:14:59 -07:00
Sanjay Patel f2a322bfcf [SROA] prevent crash on large memset length (PR50910)
I don't know much about this pass, but we need a stronger
check on the memset length arg to avoid an assert. The
current code was added with D59000.
The test is reduced from:
https://llvm.org/PR50910

Differential Revision: https://reviews.llvm.org/D106462
2021-07-31 14:07:30 -04:00
Sanjay Patel a22c99c3c1 [InstCombine] canonicalize cmp-of-bitcast-of-vector-cmp to use zero constant
We can invert a compare constant and preserve the logic
as shown in this sampling:
https://alive2.llvm.org/ce/z/YAXbfs
(In theory, we could deal with non-all-ones/zero as well,
but it doesn't seem worthwhile.)

I noticed this as a part of the x86 codegen difference in
https://llvm.org/PR51259 - it ends up using "test"
instead of "not + cmp" in that example.

This pattern also shows up in https://llvm.org/PR41312
and https://llvm.org/PR50798 .

Differential Revision: https://reviews.llvm.org/D107170
2021-07-31 13:31:12 -04:00
David Green 15a1d7e839 [ARM] Switch order of creating VADDV and VMLAV.
It can be beneficial to attempt to try the larger VMLAV patterns before
VADDV, in case both may match the same code.
2021-07-31 16:28:52 +01:00
Matt Arsenault ebc17a0d68 GlobalISel: Scalarize unaligned vector stores
This has the same problems and limitations as the load path.
2021-07-31 10:37:15 -04:00
Simon Pilgrim 3a7c82efb8 [DAG] isGuaranteedNotToBeUndefOrPoison - handle ISD::BUILD_VECTOR nodes
If all demanded elements of the BUILD_VECTOR pass a isGuaranteedNotToBeUndefOrPoison check, then we can treat this specific demanded use of the BUILD_VECTOR as guaranteed not to be undef or poison either.

Differential Revision: https://reviews.llvm.org/D107174
2021-07-31 15:08:25 +01:00
Matt Arsenault bc2cb91a20 GlobalISel: Have lowerStore handle some unaligned stores
This is NFC until some of the AMDGPU legalization rules are ripped
out.
2021-07-31 10:01:42 -04:00
Alexandros Lamprineas 7d940432c4 [AArch64] Legalize MVT::i64x8 in DAG isel lowering
This patch legalizes the Machine Value Type introduced in D94096 for loads
and stores. A new target hook named getAsmOperandValueType() is added which
maps i512 to MVT::i64x8. GlobalISel falls back to DAG for legalization.

Differential Revision: https://reviews.llvm.org/D94097
2021-07-31 09:51:28 +01:00
Alexandros Lamprineas 3094e5389b [AArch64] Add a Machine Value Type for 8 consecutive registers
Adds MVT::i64x8, a Machine Value Type needed for lowering inline assembly
operands which materialize a sequence of eight general purpose registers.

Differential Revision: https://reviews.llvm.org/D94096
2021-07-31 09:51:28 +01:00
Petr Hosek 83302c8489 [profile] Fix profile merging with binary IDs
This fixes support for merging profiles which broke as a consequence
of e50a38840d. The issue was missing
adjustment in merge logic to account for the binary IDs which are
now included in the raw profile just after header.

In addition, this change also:
* Includes the version in module signature that's used for merging
to avoid accidental attempts to merge incompatible profiles.
* Moves the binary IDs size field after version field in the header
as was suggested in the review.

Differential Revision: https://reviews.llvm.org/D107143
2021-07-30 18:54:27 -07:00
Petr Hosek d3dd07e3d0 Revert "[profile] Fix profile merging with binary IDs"
This reverts commit dcadd64986.
2021-07-30 18:53:48 -07:00
Petr Hosek dcadd64986 [profile] Fix profile merging with binary IDs
This fixes support for merging profiles which broke as a consequence
of e50a38840d. The issue was missing
adjustment in merge logic to account for the binary IDs which are
now included in the raw profile just after header.

In addition, this change also:
* Includes the version in module signature that's used for merging
to avoid accidental attempts to merge incompatible profiles.
* Moves the binary IDs size field after version field in the header
as was suggested in the review.

Differential Revision: https://reviews.llvm.org/D107143
2021-07-30 17:38:53 -07:00
Petr Hosek 6ea2f31f3d Revert "[profile] Fix profile merging with binary IDs"
This reverts commit 89d6eb6f8c, this
seemed to have break a few builders.
2021-07-30 14:32:52 -07:00
Florian Mayer b5b023638a Revert "[hwasan] Detect use after scope within function."
This reverts commit 84705ed913.
2021-07-30 22:32:04 +01:00
Brendon Cahoon c4c379d633 [LoopStrengthReduction] Fix pointer extend asserts
Additional asserts were added to ScalarEvolution to enforce
pointer/int type rules. An assert is triggered when the LSR pass
attempts to extend a pointer SCEV in GenerateTruncates.

This patch changes GenerateTruncates to exit early if the Formaula
contains a ScaledReg or BaseReg with a pointer type.

Differential Revision: https://reviews.llvm.org/D107185
2021-07-30 17:24:08 -04:00
Petr Hosek 89d6eb6f8c [profile] Fix profile merging with binary IDs
This fixes support for merging profiles which broke as a consequence
of e50a38840d. The issue was missing
adjustment in merge logic to account for the binary IDs which are
now included in the raw profile just after header.

In addition, this change also:
* Includes the version in module signature that's used for merging
to avoid accidental attempts to merge incompatible profiles.
* Moves the binary IDs size field after version field in the header
as was suggested in the review.

Differential Revision: https://reviews.llvm.org/D107143
2021-07-30 13:30:30 -07:00
Rahman Lavaee 2256b359d7 Explain the symbols of basic block clusters with an example in the header comments.
This prevents from confusion with the ``labels`` option.

Reviewed By: snehasish

Differential Revision: https://reviews.llvm.org/D107128
2021-07-30 12:08:04 -07:00
Fangrui Song a1532ed275 [InstrProfiling] Make CountersPtr in __profd_ relative
Change `CountersPtr` in `__profd_` to a label difference, which is a link-time
constant. On ELF, when linking a shared object, this requires that `__profc_` is
either private or linkonce/linkonce_odr hidden. On COFF, we need D104564 so that
`.quad a-b` (64-bit label difference) can lower to a 32-bit PC-relative relocation.

```
# ELF: R_X86_64_PC64 (PC-relative)
.quad .L__profc_foo-.L__profd_foo

# Mach-O: a pair of 8-byte X86_64_RELOC_UNSIGNED and X86_64_RELOC_SUBTRACTOR
.quad l___profc_foo-l___profd_foo

# COFF: we actually use IMAGE_REL_AMD64_REL32/IMAGE_REL_ARM64_REL32 so
# the high 32-bit value is zero even if .L__profc_foo < .L__profd_foo
# As compensation, we truncate CountersDelta in the header so that
# __llvm_profile_merge_from_buffer and llvm-profdata reader keep working.
.quad .L__profc_foo-.L__profd_foo
```

(Note: link.exe sorts `.lprfc` before `.lprfd` even if the object writer
has `.lprfd` before `.lprfc`, so we cannot work around by reordering
`.lprfc` and `.lprfd`.)

With this change, a stage 2 (`-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR`)
`ld -pie` linked clang is 1.74% smaller due to fewer R_X86_64_RELATIVE relocations.
```
% readelf -r pie | awk '$3~/R.*/{s[$3]++} END {for (k in s) print k, s[k]}'
R_X86_64_JUMP_SLO 331
R_X86_64_TPOFF64 2
R_X86_64_RELATIVE 476059  # was: 607712
R_X86_64_64 2616
R_X86_64_GLOB_DAT 31
```

The absolute function address (used by llvm-profdata to collect indirect call
targets) can be converted to relative as well, but is not done in this patch.

Differential Revision: https://reviews.llvm.org/D104556
2021-07-30 11:52:18 -07:00
David Green 69cdadddec [ARM] Distribute reductions based on ascending load offset
This distributes reductions based on the relative offset of loads, if
one is found from their operands. Given chains of reductions this will
then sort them in ascending load order, which in turn can help simple
prefetches latch on to increasing strides more easily.

Differential Revision: https://reviews.llvm.org/D106569
2021-07-30 19:50:07 +01:00
Simon Pilgrim afc6b09dee [InstCombine] getMaskedTypeForICmpPair - remove dead code. NFCI.
Ok should be true at this point, so the early-out is dead - replace with an assert.
2021-07-30 19:23:05 +01:00
Simon Pilgrim 3c0b596ecc SelectionDAGDumper.cpp - remove nested if-else return chain. NFCI.
Match style and don't use an else after a return.
2021-07-30 19:23:05 +01:00
Simon Pilgrim 986841cca2 SelectionDAGDumper.cpp - printrWithDepthHelper - remove dead code. NFCI.
Fixes coverity warning - we have an early-out for unsigned depth == 0, so the depth < 1 early-out later on is dead code.
2021-07-30 19:23:04 +01:00
Matt Arsenault e46badd4e9 GlobalISel: Have lowerLoad scalarize unaligned vectors
This could be smarter by picking an ideal type, or at least splitting
the vector in half first. Also handles lower for non-power-of-2,
non-extending vector loads.

Currently this just avoids failing to legalize some odd vector AMDGPU
tests, but is a step towards removing the split logic from the
NarrowScalar logic.
2021-07-30 13:23:29 -04:00
Alexey Bataev 95e5d401ae [SLP]Improve splats vectorization.
Replace insertelement instructions for splats with just single
insertelement + broadcast shuffle. Also, try to merge these instructions
if they come from the same/shuffled gather node.

Differential Revision: https://reviews.llvm.org/D107104
2021-07-30 10:17:45 -07:00
Kerry McLaughlin 9d35594993 Reland "[LV] Use lookThroughAnd with logical reductions"
If a reduction Phi has a single user which `AND`s the Phi with a type mask,
`lookThroughAnd` will return the user of the Phi and the narrower type represented
by the mask. Currently this is only used for arithmetic reductions, whereas loops
containing logical reductions will create a reduction intrinsic using the widened
type, for example:

  for.body:
    %phi = phi i32 [ %and, %for.body ], [ 255, %entry ]
    %mask = and i32 %phi, 255
    %gep = getelementptr inbounds i8, i8* %ptr, i32 %iv
    %load = load i8, i8* %gep
    %ext = zext i8 %load to i32
    %and = and i32 %mask, %ext
    ...

^ this will generate an and reduction intrinsic such as the following:
    call i32 @llvm.vector.reduce.and.v8i32(<8 x i32>...)

The same example for an add instruction would create an intrinsic of type i8:
    call i8 @llvm.vector.reduce.add.v8i8(<8 x i8>...)

This patch changes AddReductionVar to call lookThroughAnd for other integer
reductions, allowing loops similar to the example above with reductions such
as and, or & xor to vectorize.

Reviewed By: david-arm, dmgreen

Differential Revision: https://reviews.llvm.org/D105632
2021-07-30 18:04:09 +01:00
Matt Arsenault f19226dda5 GlobalISel: Have load lowering handle some unaligned accesses
The code for splitting an unaligned access into 2 pieces is
essentially the same as for splitting a non-power-of-2 load for
scalars. It would be better to pick an optimal memory access size and
directly use it, but splitting in half is what the DAG does.

As-is this fixes handling of some unaligned sextload/zextloads for
AMDGPU. In the future this will help drop the ugly abuse of
narrowScalar to handle splitting unaligned accesses.
2021-07-30 12:55:58 -04:00
Matt Arsenault faccf427df AMDGPU/GlobalISel: Remove special case lowering for non-pow-2 stores
We end up with extra copies from buildAnyExtOrTrunc if these are
lowered after the register types are legalized.
2021-07-30 12:37:29 -04:00
Kazu Hirata e76ddfa9ef [Transforms] Remove HasValueForBlock (NFC)
The function seems to be unused for at least one year.
2021-07-30 08:56:49 -07:00
Dylan Fleming a7a39ec886 [SVE] Add folds for sign and zero extends of vscale
Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D105994
2021-07-30 16:02:50 +01:00
David Green 532d05b714 [ARM] Attempt to distribute reductions
This adds a combine for adds of reductions, distributing them so that
they occur sequentially to enable better use of accumulating VADDVA
instructions. It combines:
  add(X, add(vecreduce(Y), vecreduce(Z))) ->
    add(add(X, vecreduce(Y)), vecreduce(Z))
and
  add(add(A, reduce(B)), add(C, reduce(D))) ->
    add(add(add(A, C), reduce(B)), reduce(D))

These together distribute the add's so that more reductions can be
selected to VADDVA.

Differential Revision: https://reviews.llvm.org/D106532
2021-07-30 14:48:31 +01:00
Florian Mayer 84705ed913 [hwasan] Detect use after scope within function.
Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D105201
2021-07-30 13:59:36 +01:00
Alexey Bataev 4b25c11321 [SLP]Fix an assertion for the size of user nodes.
For the nodes with reused scalars the user may be not only of the size
of the final shuffle but also of the size of the scalars themselves,
need to check for this. It is safe to just modify the check here, since
the order of the scalars themselves is preserved, only indeces of the
reused scalars are changed. So, the users with the same size as the
number of scalars in the node, will not be affected, they still will get
the operands in the required order.

Reported by @mstorsjo in D105020.

Differential Revision: https://reviews.llvm.org/D107080
2021-07-30 05:46:44 -07:00
Alexey Bataev f4fb854811 [SLP]Do not consider deleted instruction as external users.
If the instruction was previously deleted, it should not be treated as
an external user. This fixes cost estimation and removes dead
extractelement instructions.

Differential Revision: https://reviews.llvm.org/D107106
2021-07-30 05:37:43 -07:00
Alexey Bataev c2deb2afaf [SLP]Fix a crash in gathered loads analysis.
Need to check that the minimum acceptable vector factor is at least 2,
not 0, to avoid compiler crash during gathered loads analysis.

Differential Revision: https://reviews.llvm.org/D107058
2021-07-30 05:19:17 -07:00
Alex Zinenko aa426c372c [OMPIRBuilder] add minimalist reduction support
This introduces a builder function for emitting IR performing reductions in
OpenMP. Reduction variable privatization and initialization to the
reduction-neutral value is expected to be handled separately. The caller
provides the reduction functions. Further commits can provide implementation of
reduction functions for the reduction operators defined in the OpenMP
specification.

This implementation was tested on an MLIR fork targeting OpenMP from C and
produced correct executable code.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D104928
2021-07-30 13:58:26 +02:00
David Green 4b56306762 [ARM] Turn vecreduce_add(add(x, y)) into vecreduce(x) + vecreduce(y)
Under MVE we can use VADDV/VADDVA's to perform integer add reductions,
so it can be beneficial to use more reductions than summing subvectors
and reducing once. Especially for VMLAV/VMLAVA the mul can be
incorporated into the reduction, producing less instructions.

Some of the test cases currently get larger due to extra integer adds,
but will be improved in a followup patch.

Differential Revision: https://reviews.llvm.org/D106531
2021-07-30 10:10:41 +01:00
Cullen Rhodes 3a349d2269 [AArch64][SME] Introduce feature for streaming mode
The Scalable Matrix Extension (SME) introduces a new execution mode
called Streaming SVE mode. In streaming mode a substantial subset of the
SVE and SVE2 instruction set is available, along with new outer product,
load, store, extract and insert instructions that operate on the new
architectural register state for the matrix.

To support streaming mode this patch introduces a new subtarget feature
+streaming-sve. If enabled, the subset of SVE(2) instructions are
available. The existing behaviour for SVE(2) remains unchanged, the
subset of instructions that are legal in streaming mode are enabled if
either +sve[2] or +streaming-sve is specified. Instructions that are
illegal in streaming mode remain predicated on +sve[2].

The SME target feature has been updated to imply +streaming-sve rather
than +sve.

The following changes are made to the SVE(2) tests:
  * For instructions that are legal in streaming mode:
    - added RUN line to verify +streaming-sve enables the instruction.
    - updated diagnostic to 'instruction requires: streaming-sve or sve'.
  * For instructions that are illegal in streaming-mode:
    - added RUN line to verify +streaming-sve does not enable the
      instruction.

SVE(2) instructions that are legal in streaming mode have:

  if !HaveSVE[2]() && !HaveSME() then UNDEFINED;

at the top of the pseudocode in the XML.

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06/SVE-Instructions

Reviewed By: sdesmalen, david-arm

Differential Revision: https://reviews.llvm.org/D106272
2021-07-30 07:30:45 +00:00
Lang Hames 8a241cd9c2 [JITLink][ELF][x86-64] Include relocation name in missing relocation errors.
This saves a level of manual table lookup for those of us who don't remember
ELF relocation numbers off the top of our heads.
2021-07-30 15:19:11 +10:00
Tarindu Jayatilaka 7a797b2902 Take OptimizationLevel class out of Pass Builder
Pulled out the OptimizationLevel class from PassBuilder in order to be able to access it from within the PassManager and avoid include conflicts.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D107025
2021-07-29 21:57:23 -07:00
Stefan Pintilie 754520a2bf [PowerPC] Fix issue where hint was providing the incorrect regsiter class.
Regsier hints when copying to a UACC register do not always produce VSRp
registers. This patch makes sure that we do not produce hints in cases
where the subregsiter of the UACC is not a VSRp.

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D107101
2021-07-29 21:10:45 -05:00
Esme-Yi 8011fc1953 [yaml2obj] Enable support for parsing 64-bit XCOFF.
Summary: Add support for yaml2obj to parse 64-bit XCOFF.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D100375
2021-07-30 02:06:04 +00:00
Mark Schimmel e622c99f30 [ARC] Add norm/normh instructions with disassembly tests
Add disassembler support for the NORM and NORMH instructions. These instructions
only exist when the ARC processor is configured with the "norm" extension.

fferential Revision: https://reviews.llvm.org/D107118
2021-07-29 17:54:52 -07:00
Ben Shi bb6fddb63c Optimize mul in the zba extension with SH*ADD
This patch does the following optimization of mul with a constant.

(mul x, 11) -> (SH1ADD (SH2ADD x, x), x)
(mul x, 19) -> (SH1ADD (SH3ADD x, x), x)
(mul x, 13) -> (SH2ADD (SH1ADD x, x), x)
(mul x, 21) -> (SH2ADD (SH2ADD x, x), x)
(mul x, 37) -> (SH2ADD (SH3ADD x, x), x)
(mul x, 25) -> (SH3ADD (SH1ADD x, x), x)
(mul x, 41) -> (SH3ADD (SH2ADD x, x), x)
(mul x, 73) -> (SH3ADD (SH3ADD x, x), x)
(mul x, 27) -> (SH1ADD (SH3ADD x, x), (SH3ADD x, x))
(mul x, 45) -> (SH2ADD (SH3ADD x, x), (SH3ADD x, x))
(mul x, 81) -> (SH3ADD (SH3ADD x, x), (SH3ADD x, x))

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D107065
2021-07-30 08:36:28 +08:00
Thomas Johnson cc238a6e03 [ARC] Add additional mov immediate instruction formats with a fix for u6 decoding
Differential Revision: https://reviews.llvm.org/D107088
2021-07-29 16:41:55 -07:00
Joseph Huber cd0dd8ece8 [OpenMP] Adding flags for disabling the following optimizations: Deglobalization SPMDization State machine rewrites Folding
This work provides four flags to disable four different sets of OpenMP optimizations. These flags take effect in llvm/lib/Transforms/IPO/OpenMPOpt.cpp and include the following:
 - openmp-opt-disable-deglobalization: Defaults to false, adding this flag sets the variable DisableOpenMPOptDeglobalization to true. This prevents AA registration for HeapToStack and HeapToShared.
 - openmp-opt-disable-spmdization: Defaults to false, adding this flag sets the variable DisableOpenMPOptSPMDization to true. This indicates a pessimistic fixpoint in changeToSPMDMode.
 - openmp-opt-disable-folding: Defaults to false, adding this flag sets the variable DisableOpenMPOptFolding to true. This indicates a pessimistic fixpoint in the attributor init for AAFoldRuntimeCall.
 - openmp-opt-disable-state-machine-rewrite: Defaults to false, adding this flag sets the variable DisableOpenMPOptStateMachineRewrite to true. This first prevents changes to the state machine in rewriteDeviceCodeStateMachine by returning before changes are made, and if a custom state machine is built in buildCustomStateMachine, stops by returning a pessimistic fixpoint.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D106802
2021-07-29 19:28:31 -04:00
Adrian Prantl c5d84d2eb3 GlobalISel/AArch64: don't optimize away redundant branches at -O0
This patch prevents GlobalISel from optimizing out redundant branch
instructions when compiling without optimizations.

The motivating example is code like the following common pattern in
Swift, where users expect to be able to set a breakpoint on the early
exit:

public func f(b: Bool) {
  guard b else {
    return // I would like to set a breakpoint here.
  }
  ...
}

The patch modifies two places in GlobalISEL: The first one is in
IRTranslator.cpp where the removal of redundant branches is made
conditional on the optimization level. The second one is in
AArch64InstructionSelector.cpp where an -O0 *only* optimization is
being removed.

Disabling these optimizations increases code size at -O0 by
~8%. However, doing so improves debuggability, and debug builds are
the primary reason why developers compile without optimizations. We
thus concluded that this is the right trade-off.

rdar://79515454

This tenatively reapplies the patch without modifications, the LLDB
test that has blocked this from landing previously has since been
modified to hopefully no longer be sensitive to this change.

Differential Revision: https://reviews.llvm.org/D105238
2021-07-29 16:04:22 -07:00
David Green d4a2daa919 [ARM] Define a couple more ssub indexes. NFC
Same as 91bd3ad128, this doesn't really
change anything but gives the registers better names than the ones
tablegen would define. And fills in the missing gaps.
2021-07-29 23:00:35 +01:00
Amara Emerson c54d5c9756 [GlobalISel] Use GMergeLikeOp to simplify a combine. NFC. 2021-07-29 13:53:16 -07:00
Andy Kaylor b4d945bacd Fixing an infinite loop problem in InstCombine
Patch by Mohammad Fawaz

This issues started happening after
b373b5990d
Basically, if the memcpy is volatile, the collectUsers() function should
return false, just like we do for volatile loads.

Differential Revision: https://reviews.llvm.org/D106950
2021-07-29 12:57:17 -07:00
Sander de Smalen 84a4caeb84 [InstSimplify] Don't assume parent function when simplifying llvm.vscale.
D106850 introduced a simplification for llvm.vscale by looking at the
surrounding function's vscale_range attributes. The call that's being
simplified may not yet have been inserted into the IR. This happens for
example during function cloning.

This patch fixes the issue by checking if the instruction is in a
parent basic block.
2021-07-29 20:08:08 +01:00
Amara Emerson 532c458fa8 [GlobalISel] Add GPtrAdd and use it in some combines. 2021-07-29 12:04:02 -07:00
Dawid Jurczak 5c315bee8c [DSE] Transform memset + malloc --> calloc (PR25892)
After this change DSE can eliminate malloc + memset and emit calloc.
It's https://reviews.llvm.org/D101440 follow-up.

Differential Revision: https://reviews.llvm.org/D103009
2021-07-29 18:34:10 +02:00
Jessica Clarke 95ef464ac9 Handle subregs and superregs in callee-saved register mask
If a target lists both a subreg and a superreg in a callee-saved
register mask, the prolog will spill both aliasing registers. Instead,
don't spill the subreg if a superreg is being spilled. This case is hit by the
PowerPC SPE code, as well as a modified RISC-V backend for CHERI I maintain out
of tree.

Reviewed By: jhibbits

Differential Revision: https://reviews.llvm.org/D73170
2021-07-29 16:53:29 +01:00
Rosie Sumpter fab5659c79 Revert "[LoopFlatten] Fix missed LoopFlatten opportunity"
This reverts commit 2df8bf9339.

Reverting because it causes an assertion failure.
2021-07-29 15:52:45 +01:00
Sanjay Patel fa6b2c9915 [DAGCombiner] don't try to partially reduce add-with-overflow ops
This transform was added with D58874, but there were no tests for overflow ops.
We need to change this one way or another because it can crash as shown in:
https://llvm.org/PR51238

Note that if there are no uses of an overflow op's bool overflow result, we
reduce it to a regular math op, so we continue to fold that case either way.
If we have uses of both the math and the overflow bool, then we are likely
not saving anything by creating an independent sub instruction as seen in
the test diffs here.

This patch makes the behavior in SDAG consistent with what we do in
instcombine AFAICT.

Differential Revision: https://reviews.llvm.org/D106983
2021-07-29 08:51:54 -04:00
Andrew Savonichev bcc83a2e83 [MCA] Use LSU for the in-order pipeline
Load/Store unit is used to enforce order of loads and stores if they
alias (controlled by --noalias=false option).

Fixes PR50483 - [MCA] In-order pipeline doesn't track memory
load/store dependencies.

Differential Revision: https://reviews.llvm.org/D103955
2021-07-29 14:40:23 +03:00
Bradley Smith 191831e380 [AArch64][SVE] Fix incorrect mask type when lowering fixed type SVE gather/scatter
An incorrect mask type when lowering an SVE gather/scatter was causing
a codegen fault which manifested as the incorrect predicate size being
used for an SVE gather/scatter, (e.g.. p0.b rather than p0.d).

Fixes PR51182.

Differential Revision: https://reviews.llvm.org/D106943
2021-07-29 11:22:17 +00:00
Jeremy Morse 2537120c87 Follow-up to D105207, only salvage affine SCEVs to avoid a crash
SCEVToIterCountExpr only expects to be fed affine expressions, but
DbgRewriteSalvageableDVIs is feeding it non-affine induction variables.
Following this up with an obvious fix, will add test coverage too if
this avoids D105207 being reverted.
2021-07-29 11:48:08 +01:00
Cullen Rhodes 08d92dbbff [AArch64][AsmParser] NFC: Parser.getTok() -> getTok()
Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D106949
2021-07-29 10:18:54 +00:00
Amara Emerson da61ab8475 [AArch64][GlobalISel] More widenToNextPow2 changes, this time for arithmetic/bitwise ops. 2021-07-29 03:02:29 -07:00
Mirko Brkusanin 971f4173f8 [AMDGPU][GlobalISel] Insert an and with exec before s_cbranch_vccnz if necessary
While v_cmp will AND inactive lanes with 0, that is not the case for logical
operations.

This fixes a Vulkan CTS test that would hang otherwise.

Differential Revision: https://reviews.llvm.org/D105709
2021-07-29 11:20:49 +02:00
Rosie Sumpter 2df8bf9339 [LoopFlatten] Fix missed LoopFlatten opportunity
When the trip count of the inner loop is a constant, the InstCombine
pass now causes the transformation e.g. imcp ult i32 %inc, tripcount ->
icmp ult %j, tripcount-step (where %j is the inner loop induction
variable and %inc is add %j, step), which is now accounted for when
identifying the trip count of the loop. This is also an acceptable use
of %j (provided the step is 1) so is ignored as long as the compare
that it's used in is also the condition of the inner branch.

Differential Revision: https://reviews.llvm.org/D105802
2021-07-29 09:47:41 +01:00
Fraser Cormack 02dd4b59bc [RISCV] Optimize floating-point "dominant value" BUILD_VECTORs
This patch aims to improve the performance of BUILD_VECTORs which are
identified as containing a dominant element. Given that most
floating-point constants themselves require a load from the constant
pool, it was possible for the optimization to actually increase the
number of individual loads on small vectors. The exception is the zero
constant -- +0.0 -- which can be materialized efficiently.

While this optimization could do with a proper cost model to weigh the
benfits of a single vector load vs. the manipulation of individual
elements -- even for integer vectors which often require several
instructions to materialize -- without a concrete RVV implementation to
work with any heuristic is likely to be both more obtuse and inaccurate.

Until then, this patch fixes at least one known obvious deficiency.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D106963
2021-07-29 09:22:34 +01:00
Jun Ma e2fe26e77b [NFC][InstSimplify] Use more intuitive variable names. 2021-07-29 13:55:47 +08:00
Guozhi Wei 50b6273145 [MBP] findBestLoopTopHelper should exit if OldTop is not a chain header
Function findBestLoopTopHelper tries to find a new loop top block which can also
fall through to OldTop, but it's impossible if OldTop is not a chain header, so
it should exit immediately.

Differential Revision: https://reviews.llvm.org/D106329
2021-07-28 19:00:45 -07:00
Ben Shi 264b8e2a20 [RISCV] Optimize mul in the zba extension with SH*ADD
This patch makes the following optimization, if the
immediate multiplier is not a simm12.

(mul x, (power_of_2 + 2)) => (SH1ADD x, (SLLI x, bits))
(mul x, (power_of_2 + 4)) => (SH2ADD x, (SLLI x, bits))
(mul x, (power_of_2 + 8)) => (SH3ADD x, (SLLI x, bits))

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D106648
2021-07-29 09:46:41 +08:00
Wenlei He 1a8087adaf [ThinLTO] Disallow importing for functions with indir branch to block address
We don't allowing inlining for functions with blockaddress with uses other than strictly callbr. This is because if the blockaddress escapes the function via a global variable, inlining may lead to an invalid cross-function reference.

We check against such cases during inlining, however the check can fail for ThinLTO post-link because CFG simplification can incorrectly removes blocks based on wrong block reachability.

When we import a function with blockaddress taken in a global variable but without importing that variable, we won't go through value mapping to reflect the real address-taken-ness of the cloned blocks. For the imported clone, this leads to blocks reachable from indirect branch through global variable being incorrectly treated as unreachable and removed by SimplifyCFG.

Since inlining for such cases shouldn't be allowed in the first place, I'm marking them as ineligible for importing during pre-link to save the problem of missing address-taken-ness of imported clone as well as bad DCE and inlining.

Differential Revision: https://reviews.llvm.org/D106930
2021-07-28 18:02:48 -07:00
Daniel Rodríguez Troitiño d6704e5ed9 [llvm-objcopy][MachO] Ignore all LC_SUB_* commands.
The LC_SUB_FRAMEWORK, LC_SUB_UMBRELLA, LC_SUB_CLIENT, and LC_SUB_LIBRARY
are used to indicate related libraries, binaries or framework names.
Their only payload is the string with the name of the object. Adding
those commands to the list of ignored/skipped load commands will avoid
an error that stop the process of copying/stripping and will copy their
contents verbatim.

Additionally, in order to have a test for this case, `yaml2obj` now
allows those four commands to contain a `Content`.

Differential Revision: https://reviews.llvm.org/D106412
2021-07-28 17:35:26 -07:00
Jessica Paquette 5a333dc5da [AArch64][GlobalISel] Improve legalization for odd-type G_LOAD
Swap the order of widening so that we widen to the next power-of-2 first when
legalizing G_LOAD.

Also, provide a minimum type for the power of 2 to disallow s2 + s1. Clamping
ought to disallow s2 and s1, but I think it's better to be explicit about the
expected minimum size.

We probably need a similar change for G_STORE, but it seems to be a bit more
finnicky. So, let's just handle G_LOAD for now.

Differential Revision: https://reviews.llvm.org/D107013
2021-07-28 17:19:14 -07:00
Joseph Huber adbaa39dfc [Attributor] Change function internalization to not replace uses in internalized callers
The current implementation of function internalization creats a copy of each
function and replaces every use. This has the downside that the external
versions of the functions will call into the internalized versions of the
functions. This prevents them from being fully independent of eachother. This
patch replaces the current internalization scheme with a method that creates
all the copies of the functions intended to be internalized first and then
replaces the uses as long as their caller is not already internalized.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106931
2021-07-28 18:57:28 -04:00
Jessica Paquette c0a41c3d3b [AArch64][GlobalISel] Improve legalization for odd-sized G_ICMP/G_CONSTANT
We were handing types like s88 like

1) clamp to the range
2) widen to the next power of 2

This isn't desirable because it causes an odd breakdown for types like s88.
If we widen to the next power of 2 (s128) first, then we get a clean breakdown
when we clamp back to s64.

Differential Revision: https://reviews.llvm.org/D106998
2021-07-28 15:31:33 -07:00
Chris Jackson 0ba8595287 [DebugInfo][LoopStrengthReduction] SCEV-based salvaging for LSR
Reapply commit d675b594f4 that was
reverted due to buildbot failures. A simple fix has been applied to
remove an assertion.

Differential Revision: https://reviews.llvm.org/D105207
2021-07-28 23:04:59 +01:00
Eli Friedman 4adcff0b70 [ARM] Fix llvm-objdump disassembly of armv7m object files.
Apparently, the features were getting mixed up, so we'd try to
disassemble in ARM mode. Fix sub-architecture detection to compute the
correct triple if we're detecting it automatically, so the user doesn't
need to pass --triple=thumb etc.

It's possible we should be somehow tying the "+thumb-mode" target
feature more directly to Tag_CPU_arch_profile? But this seems to work
reasonably well, anyway.

While I'm here, fix up the other llvm-objdump tests that were explicitly
specifying an ARM triple; that shouldn't be necessary.

Differential Revision: https://reviews.llvm.org/D106912
2021-07-28 11:41:54 -07:00
Patrick Holland dbed061bf1 [MCA] Moving the target specific CustomBehaviour impl. from /tools/llvm-mca/ to /lib/Target/.
Differential Revision: https://reviews.llvm.org/D106775
2021-07-28 11:23:18 -07:00
Sjoerd Meijer bc43078fe8 [LoopFlatten] Fix bug where SCEVCouldNotCompute object is used
The SCEV method getBackedgeTakenCount() returns a SCEVCouldNotCompute
object if the backedge-taken count is unpredictable. This fix ensures
there is no longer an attempt to use such an object to find the trip
count.

Patch by: Rosie Sumpter.

Differential Revision: https://reviews.llvm.org/D106970
2021-07-28 18:35:08 +01:00
Jeroen Dobbelaere 03b8c69d06 [PredicateInfo] Use Intrinsic::getDeclaration now that it handles unnamed types.
This is a second attempt to fix the EXPENSIVE_CHECKS issue that was mentioned  In D91661#2875179 by @jroelofs.

(The first attempt was in D105983)

D91661 more or less completely reverted D49126 and by doing so also removed the cleanup logic of the created declarations and calls.
This patch is a replacement for D91661 (which must itself be reverted first). It replaces the custom declaration creation with the
generic version and shows the test impact. It also tracks the number of NamedValues to detect if a new prototype was added instead
of looking at the available users of a prototype.

Reviewed By: jroelofs

Differential Revision: https://reviews.llvm.org/D106147
2021-07-28 19:30:29 +02:00
Jeroen Dobbelaere dc5570d149 Revert "Revert of D49126 [PredicateInfo] Use custom mangling to support ssa_copy with unnamed types."
This reverts commit 77080a1eb6.

This change introduced issues detected with EXPENSIVE_CHECKS. Reverting to restore the
needed function cleanup. A next patch will then just improve on the name mangling.
2021-07-28 19:30:29 +02:00
Fangrui Song 6da3d8b19c [llvm] Replace LLVM_ATTRIBUTE_NORETURN with C++11 [[noreturn]]
[[noreturn]] can be used since Oct 2016 when the minimum compiler requirement was bumped to GCC 4.8/MSVC 2015.

Note: the definition of LLVM_ATTRIBUTE_NORETURN is kept for now.
2021-07-28 09:31:14 -07:00
Craig Topper 3106f85945 [RISCV] Fix grammar in a comment. NFC 2021-07-28 09:09:26 -07:00
Craig Topper 54588bcc05 [RISCV] Restrict performANY_EXTENDCombine to prevent an infinite loop.
The sign_extend we insert here can get turned into a zero_extend if
the sign bit is known zero. This can enable a setcc combine that
shrinks compares with zero_extend. This reduces the use count of
the zero_extend allowing other combines to turn it back into an
any_extend.

This restricts the combine to only cases where the result is used
by a CopyToReg. This works for my original motivating case. I
hope the CopyToReg use will prevent any converted extends from
turning back into an any_extend.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D106754
2021-07-28 09:05:45 -07:00
Chris Jackson 3992896043 Revert "[DebugInfo][LoopStrengthReduction] SCEV-based salvaging for LSR"
Reverted due to buildbot failures.
This reverts commit d675b594f4.
2021-07-28 16:44:54 +01:00
Chris Jackson d675b594f4 [DebugInfo][LoopStrengthReduction] SCEV-based salvaging for LSR
Reapply commit 796b84d26f that was
reverted due to reports of crashes. A minor change now guards against
getVariableLocationOperand() returning a nullptr.

Differential Revision: https://reviews.llvm.org/D106659
2021-07-28 16:28:46 +01:00
Sanjay Patel 5b83261c15 [DivRemPairs] make sure we have a valid CFG for hoisting division
This transform was added with e38b7e8948
and as shown in:
https://llvm.org/PR51241
...it could crash without an extra check of the blocks.

There might be a more compact way to write this constraint,
but we can't just count the successors/predecessors without
affecting a test that includes a switch instruction.
2021-07-28 11:09:12 -04:00
Jeremy Morse 8612417e5a [DebugInfo][InstrRef] Don't break up ret-sequences on debug-info instrs
When we have a terminator sequence (i.e. a tailcall or return),
MIIsInTerminatorSequence is used to work out where the preceding ABI-setup
instructions end, i.e. the parts that were glued to the terminator
instruction. This allows LLVM to split blocks safely without having to
worry about ABI stuff.

The function only ignores DBG_VALUE instructions, meaning that the two
debug instructions I recently added can end terminator sequences early,
causing various MachineVerifier errors. This patch promotes the test for
debug instructions from "isDebugValue" to "isDebugInstr", thus avoiding any
debug-info interfering with this function.

Differential Revision: https://reviews.llvm.org/D106660
2021-07-28 15:56:00 +01:00
Jun Ma ca0fe3447f [InstSimplify] Simplify llvm.vscale when vscale_range attribute exists
Reduce llvm.vscale to constant based on vscale_range attribute.

Differential Revision: https://reviews.llvm.org/D106850
2021-07-28 21:41:52 +08:00
Alexey Bataev 3ad6437fcc [SLP]Fix build on MacOS, NFC. 2021-07-28 06:33:13 -07:00
Sanjay Patel 4c41caa287 [x86] improve CMOV codegen by pushing add into operands, part 3
In this episode, we are trying to avoid an x86 micro-arch quirk where complex
(3 operand) LEA potentially costs significantly more than simple LEA. So we
simultaneously push and pull the math around the CMOV to balance the operations.

I looked at the debug spew during instruction selection and decided against
trying a later DAGToDAG transform -- it seems very difficult to match if the
trailing memops are already selected and managing the creation of extra
instructions at that level is always tricky.

Differential Revision: https://reviews.llvm.org/D106918
2021-07-28 09:10:33 -04:00
Simon Pilgrim 124d586382 [X86][AVX] Move VPERM2F128 defs above VINSERTF128 defs. NFC.
This will be necessary for a future patch to lower VINSERTF128 custom folds to VPERM2F128
2021-07-28 14:02:17 +01:00
Alexey Bataev e408d1dfab [SLP]Improve graph reordering.
Reworked reordering algorithm. Originally, the compiler just tried to
detect the most common order in the reordarable nodes (loads, stores,
extractelements,extractvalues) and then fully rebuilding the graph in
the best order. This was not effecient, since it required an extra
memory and time for building/rebuilding tree, double the use of the
scheduling budget, which could lead to missing vectorization due to
exausted scheduling resources.

Patch provide 2-way approach for graph reodering problem. At first, all
reordering is done in-place, it doe not required tree
deleting/rebuilding, it just rotates the scalars/orders/reuses masks in
the graph node.

The first step (top-to bottom) rotates the whole graph, similarly to the previous
implementation. Compiler counts the number of the most used orders of
the graph nodes with the same vectorization factor and then rotates the
subgraph with the given vectorization factor to the most used order, if
it is not empty. Then repeats the same procedure for the subgraphs with
the smaller vectorization factor. We can do this because we still need
to reshuffle smaller subgraph when buildiong operands for the graph
nodes with lasrger vectorization factor, we can rotate just subgraph,
not the whole graph.

The second step (bottom-to-top) scans through the leaves and tries to
detect the users of the leaves which can be reordered. If the leaves can
be reorder in the best fashion, they are reordered and their user too.
It allows to remove double shuffles to the same ordering of the operands in
many cases and just reorder the user operations instead. Plus, it moves
the final shuffles closer to the top of the graph and in many cases
allows to remove extra shuffle because the same procedure is repeated
again and we can again merge some reordering masks and reorder user nodes
instead of the operands.

Also, patch improves cost model for gathering of loads, which improves
x264 benchmark in some cases.

Gives about +2% on AVX512 + LTO (more expected for AVX/AVX2) for {625,525}x264,
+3% for 508.namd, improves most of other benchmarks.
The compile and link time are almost the same, though in some cases it
should be better (we're not doing an extra instruction scheduling
anymore) + we may vectorize more code for the large basic blocks again
because of saving scheduling budget.

Differential Revision: https://reviews.llvm.org/D105020
2021-07-28 05:49:06 -07:00
Wael Yehia 9559bd1990 [LTO][Legacy] Add new API to check presence of ctor/dtor functions.
On AIX, the linker needs to check whether a given lto_module_t contains
any constructor/destructor functions, in order to implement the behavior
of the -bcdtors:all flag. See
https://www.ibm.com/docs/en/aix/7.2?topic=l-ld-command for the flag's
documentation.
In llvm IR, constructor (destructor) functions are added to a special
global array @llvm.global_ctors (@llvm.global_dtors).
However, because these two symbols are artificial, they are not visited
during the symbol traversal (using the
lto_module_get_[num_symbols|symbol_name|symbol_attribute] API).

This patch adds a new function to the libLTO interface that checks the
presence of one or both of these two symbols.

Reviewed By: steven_wu

Differential Revision: https://reviews.llvm.org/D106887
2021-07-28 12:41:56 +00:00
Florian Hahn c07dd2b885
[LV] Move recurrence backedge fixup code to VPlan::execute (NFC).
As suggested in D105008, move the code that fixes up the backedge value
for first order recurrences to VPlan::execute.

Now all that remains in fixFirstOrderRecurrences is the code responsible
for creating the exit values in the middle block.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D106244
2021-07-28 13:32:40 +01:00
David Green 41cedb1c9a [LV][ARM] Tighten up MLA reduction costing
This makes a couple of changes to the costing of MLA reduction patterns,
to more accurately cost various patterns that can come up from
vectorization.

 - The Arm implementation of getExtendedAddReductionCost is altered to
   only provide costs for legal or smaller types. Larger than legal types
   need to be split, which currently does not work very well, especially
   for predicated reductions where the predicate may be legal but needs to
   be split. Currently we limit it to legal or smaller input types.
 - The getReductionPatternCost has learnt that reduce(ext(mul(ext, ext))
   is a pattern that can come up, and can be treated the same as
   reduce(mul(ext, ext)) providing the extension types match.
 - And it has been adjusted to not count the ext in reduce(mul(ext, ext))
   as part of a reduce(mul) pattern.

Together these changes help to more accurately cost the mla reductions
in cases such as where the extend types don't match or the extend
opcodes are different, picking better vector factors that don't result
in expanded reductions.

Differential Revision: https://reviews.llvm.org/D106166
2021-07-28 12:50:58 +01:00
Chris Jackson 04b94c7cae Revert "[DebugInfo][LoopStrengthReduction] SCEV-based salvaging for LSR"
Crashes were reported on the upstreamm revision:
https://reviews.llvm.org/D105207

This reverts commit 796b84d26f.
2021-07-28 10:05:54 +01:00
RamNalamothu 1a8c57179a [AMDGPU] We would need FP if there is call and caller save VGPR spills
Since https://reviews.llvm.org/D98319, determineCalleeSavesSGPR() needs
to consider caller save VGPR spills as well while anticipating if we
require FP.

Fixes: SWDEV-295978

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D106758
2021-07-28 11:12:55 +05:30
Valentin Clement fe7ca1a9fc [mlir][openacc] Initial translation for DataOp to LLVM IR
Add basic translation of acc.data to LLVM IR with runtime calls.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D104301
2021-07-27 22:04:04 -04:00
Jose M Monsalve Diaz 5ab6aedda9 [OpenMP] Folding threadLimit and numThreads when single value in kernels
The device runtime contains several calls to `__kmpc_get_hardware_num_threads_in_block`
and `__kmpc_get_hardware_num_blocks`. If the thread_limit and the num_teams are constant,
these calls can be folded to the constant value.

In this patch we use the already introduced `AAFoldRuntimeCall` and the `NumTeams` and
`NumThreads` kernel attributes (to be introduced in a different patch) to fold these functions.
The code checks all the kernels, and if their attributes match, the functions are folded.

In the future we will explore specializing for multiple values of NumThreads and NumTeams.

Depends on D106390

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D106033
2021-07-27 21:47:12 -04:00
Juneyoung Lee 4f71f59bf3 [DAGCombiner] Fold SETCC(FREEZE(x),const) to FREEZE(SETCC(x,const)) if SETCC is used by BRCOND
This patch adds a peephole optimization `SETCC(FREEZE(x),const)` => `FREEZE(SETCC(x,const))`
if the SETCC is only used by BRCOND.

Combined with `BRCOND(FREEZE(X)) => BRCOND(X)`, this leads to a nice improvement in the generated assembly when x is a masked loaded value.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D105344
2021-07-28 09:22:15 +09:00
Xiang1 Zhang 3223d41017 [X86] Fix lowering to illegal type in LowerINSERT_VECTOR_ELT
Differential Revision: https://reviews.llvm.org/D106780
2021-07-28 08:16:59 +08:00
Johannes Doerfert 3dca83961c Reapply "[Attributor] Disable simplification AAs if a callback is present""
This reapplies commit cbb709e251 and
includes the use of the lookup method instead of operator[] to avoid
accidentally setting (empty) simplification callbacks.

This reverts commit aa27430a62.
2021-07-27 19:14:50 -05:00
Xiang1 Zhang 2ca3937131 Revert "[X86] Fix lowering to illegal type in LowerINSERT_VECTOR_ELT"
This reverts commit 6ff73efea9.
2021-07-28 08:12:29 +08:00
Xiang1 Zhang 6ff73efea9 [X86] Fix lowering to illegal type in LowerINSERT_VECTOR_ELT 2021-07-28 08:08:30 +08:00
Krzysztof Parzyszek 64d5b6e373 [Hexagon] Fix resetting dead registers in DBG_VALUE_LISTs
This fixes https://llvm.org/PR51229.
2021-07-27 18:36:28 -05:00
Johannes Doerfert aa27430a62 Revert "[Attributor] Disable simplification AAs if a callback is present"
This reverts commit cbb709e251 as it
breaks the tests, which was not supposed to happen. Investigating now.
2021-07-27 18:09:42 -05:00
Johannes Doerfert fd520e75f1 [Attributor] Verify `checkForAllUses` return value properly
Also do not emit more than one remark after Heap2Stack failed.
2021-07-27 17:50:27 -05:00
Johannes Doerfert cbb709e251 [Attributor] Disable simplification AAs if a callback is present
AAValueSimplify, AAValueConstantRange, and AAPotentialValues all look at
the IR by default. If queried for a IR position which has a
simplification callback we should either look at the callback return, or
give up. We do the latter for now.
2021-07-27 17:50:26 -05:00
Mircea Trofin 935dea2cb2 [MLGO] fix silly LLVM_DEBUG misuse 2021-07-27 15:10:28 -07:00
Mircea Trofin eb76ca573d [NFC][MLGO] Debug messages for what inline advisor is selected
We already have an indication (error) if the desired inline advisor
cannot be enabled, but we don't have a positive indication. Added
LLVM_DEBUG messages for the latter.
2021-07-27 15:05:39 -07:00
Nemanja Ivanovic 778932c673 [PowerPC] Turn deprecated altivec prefetch instrs to nops on AIX
The dst/dstt/dstst/dststt instructions are nop's on all PowerPC
cores that AIX supports. The AIX assembler also does not accept
these mnemonics. Turn them into nop's on AIX (similar to dstall).
2021-07-27 15:50:02 -05:00
Sanjay Patel 156ba620b3 [x86] update stale code comment; NFC
The transform was generalized with:
1ce05ad619
2021-07-27 16:45:52 -04:00
Matt Arsenault d7d2e4545e AMDGPU/GlobalISel: Fix selecting G_SEXTLOAD/G_ZEXTLOAD pre-gfx9
The patterns for the m0 glue patterns were failing to import.
2021-07-27 15:56:42 -04:00
Benjamin Kramer 05815c9f63 Remove unused include that's also a layering violation. NFC. 2021-07-27 21:21:55 +02:00
Amara Emerson a11d9a1f48 [AArch64][GlobalISel] Fix constraining LDXPX intrinsic selection.
Causes a fallback because of lack of regclasses on vregs, unless its without
asserts, where we end up crashing later in codegen.
2021-07-27 12:13:56 -07:00
Enna1 1ee6559ef6 [ASAN] NFC: Remove redundant variable
`StackAlignment` has only one use: `StackAlignment = std::max(StackAlignment, AI.getAlignment());` So it is redundant.

Reviewed By: vitalybuka, MTC

Differential Revision: https://reviews.llvm.org/D106741
2021-07-27 12:02:37 -07:00
Adam Nemet d87d3615f7 [Matrix] Fix shape for factored transpose
The shape of the input is C x R.

Differential Revision: https://reviews.llvm.org/D106722
2021-07-27 11:36:13 -07:00
Adam Nemet bf7eb48454 [Matrix] RAUW should only replace an instruction in ShapeMap if supportsShapeInfo
As an instruction is replaced in optimizeTransposes RAUW will replace it in
the ShapeMap (ShapeMap is ValueMap so that uses are updated).  In
finalizeLowering however we skip updating uses if they are in the ShapeMap
since they will be lowered separately at which point we pick up the lowered
operands.

In the testcase what happened was that since we replaced the doubled-transpose
with the shuffle, it ended up in the ShapeMap.  As we lowered the
columnwise-load the use in the shuffle was not updated.  Then as we removed
the original columnwise-load we changed that to an undef.  I.e. we ended up
with:

```
%shuf = shufflevector <8 x double> undef, <8 x double> poison, <6 x i32>
                                   ^^^^^
                                  <i32 0, i32 1, i32 2, i32 4, i32 5, i32 6>
```

Besides the fix itself, I have fortified this last bit.  As we change uses to
undef when removing instruction we track the undefed instruction to make sure
we eventually remove those too.  This would have caught the issue at compile
time.

Differential Revision: https://reviews.llvm.org/D106714
2021-07-27 11:36:13 -07:00
Alexey Zhikhartsev 02077da7e7 Add jump-threading optimization for deterministic finite automata
The current JumpThreading pass does not jump thread loops since it can
result in irreducible control flow that harms other optimizations. This
prevents switch statements inside a loop from being optimized to use
unconditional branches.

This code pattern occurs in the core_state_transition function of
Coremark. The state machine can be implemented manually with goto
statements resulting in a large runtime improvement, and this transform
makes the switch implementation match the goto version in performance.

This patch specifically targets switch statements inside a loop that
have the opportunity to be threaded. Once it identifies an opportunity,
it creates new paths that branch directly to the correct code block.
For example, the left CFG could be transformed to the right CFG:

```
          sw.bb                        sw.bb
        /   |   \                    /   |   \
   case1  case2  case3          case1  case2  case3
        \   |   /                /       |       \
        latch.bb             latch.2  latch.3  latch.1
         br sw.bb              /         |         \
                           sw.bb.2     sw.bb.3     sw.bb.1
                            br case2    br case3    br case1
```

Co-author: Justin Kreiner @jkreiner
Co-author: Ehsan Amiri @amehsan

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D99205
2021-07-27 14:34:04 -04:00
Craig Topper 3852b8c70f [RISCV] Select vector shl by 1 to a vector add.
A vector add may be faster than a vector shift.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D106689
2021-07-27 10:57:28 -07:00
Andy Kaylor b373b5990d Enabling the copy-constant-to-alloca optimization in more instances
Patch by Mohammad Fawaz

This patch allows lifetime calls to be ignored (and later erased) if we
know that the copy-constant-to-alloca optimization is going to happen.
The case that is missed is when the global variable is in a different address
space than the alloca (as shown in the example added to the lit test.)

This used to work before 6da31fa4a6

Differential Revision: https://reviews.llvm.org/D106573
2021-07-27 10:11:43 -07:00
David Sherwood a5dd6c6cf9 [LoopVectorize] Don't interleave scalar ordered reductions for inner loops
Consider the following loop:

  void foo(float *dst, float *src, int N) {
    for (int i = 0; i < N; i++) {
      dst[i] = 0.0;
      for (int j = 0; j < N; j++) {
        dst[i] += src[(i * N) + j];
      }
    }
  }

When we are not building with -Ofast we may attempt to vectorise the
inner loop using ordered reductions instead. In addition we also try
to select an appropriate interleave count for the inner loop. However,
when choosing a VF=1 the inner loop will be scalar and there is existing
code in selectInterleaveCount that limits the interleave count to 2
for reductions due to concerns about increasing the critical path.
For ordered reductions this problem is even worse due to the additional
data dependency, and so I've added code to simply disable interleaving
for scalar ordered reductions for now.

Test added here:

  Transforms/LoopVectorize/AArch64/strict-fadd-vf1.ll

Differential Revision: https://reviews.llvm.org/D106646
2021-07-27 17:41:01 +01:00
Matt Arsenault b32d3d9e81 AMDGPU: Treat IMPLICIT_DEF like a constant lanemask source
This is partially a workaround. SILowerI1Copies does not understand
unstructured loops. This would result in inserting instructions to
merge a mask register in the same block where it was defined in an
unstructured loop.
2021-07-27 11:44:38 -04:00
Thomas Lively 33786576fd [WebAssembly] Codegen for extmul SIMD instructions
Replace the clang builtins and LLVM intrinsics for the SIMD extmul instructions
with normal codegen patterns.

Differential Revision: https://reviews.llvm.org/D106724
2021-07-27 08:41:30 -07:00
Anirudh Prasad a8cfa4b9bd [SystemZ][z/OS] Initial code to generate assembly files on z/OS
- This patch consists of the bare basic code needed in order to generate some assembly for the z/OS target.
- Only the .text and the .bss sections are added for now.
- The relevant MCSectionGOFF/Symbol interfaces have been added. This enables us to print out the GOFF machine code sections.
- This patch enables us to add simple lit tests wherever possible, and contribute to the testing coverage for the z/OS target
- Further improvements and additions will be made in future patches.

Reviewed By: tmatheson

Differential Revision: https://reviews.llvm.org/D106380
2021-07-27 11:29:15 -04:00
Anna Thomas 8ee5759fd5 Strip undef implying attributes when moving calls
When hoisting/moving calls to locations, we strip unknown metadata. Such calls are usually marked `speculatable`, i.e. they are guaranteed to not cause undefined behaviour when run anywhere. So, we should strip attributes that can cause immediate undefined behaviour if those attributes are not valid in the context where the call is moved to.

This patch introduces such an API and uses it in relevant passes. See
updated tests.

Fix for PR50744.

Reviewed By: nikic, jdoerfert, lebedev.ri

Differential Revision: https://reviews.llvm.org/D104641
2021-07-27 10:57:05 -04:00
Tres Popp d225de60c9 Revert "[X86][AVX] Add getBROADCAST_LOAD helper function. NFCI."
This reverts commit 1cfecf4fc4.

This commit broke LLVM code generated through XLA by removing a
conditional on Ld->getExtensionType() == ISD::NON_EXTLOAD

This is not a perfect revert. The new function is left as other uses of
it exist now.
2021-07-27 16:55:50 +02:00
Tres Popp 70fa9479b2 Revert "Revert "[X86][AVX] Add getBROADCAST_LOAD helper function. NFCI.""
This reverts commit d7bbb1230a.

There were follow up uses of a deleted method and I didn't run the
tests. Undo the revert, so I can do it properly.
2021-07-27 16:48:31 +02:00