Commit Graph

152407 Commits

Author SHA1 Message Date
Caroline Concatto 2186b011e9 [Driver][AArch64]Add driver support for neoverse-512tvb target
The support for  neoverse-512tvb mirrors the same option available in GCC[1].
There is no functional effect for this option yet.
This patch ensures the driver accepts "-mcpu=neoverse-512tvb", and enough
plumbing is in place to allow the new option to be used in the future.

[1]https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html

Differential Revision: https://reviews.llvm.org/D112406
2021-10-28 09:08:40 +01:00
Martin Storsjö 177176f75c [Support] [Windows] Manually clean up temp files if not setting delete disposition
Since D81803 / 79657e2339, temp files
created on network shares don't set "Disposition.DeleteFile = true".
This flag normally takes care of removing the temp file both if the
process exits abnormally (either crashing or killed externally), and
when the file is closed cleanly.

For network shares, we voluntarily choose to not set the flag, and
if the operation to inspect the file handle (as a prerequisite to
setting the flag since 79657e2339)
fails we also error out. In both of these cases, we can at least make
sure to remove the temp files when they are closed cleanly.

Adjust the semantics of "OF_Delete" to not set the delete
disposition, but only set the access mode for allowing deletion.
Move the call to setDeleteDisposition into TempFile::create,
where we can check if it failed, and if it did, set a flag noting
that the file should be removed manually at the end.

This does leak files on crash, but at least doesn't leak files
in regular successful runs. (Technically, the alternative codepath
could use the RemoveFileOnSignal function, but that might complicate
the TempFile implementation further.)

This fixes https://github.com/mstorsjo/llvm-mingw/issues/233 and
https://bugs.llvm.org/show_bug.cgi?id=52080.

Differential Revision: https://reviews.llvm.org/D111875
2021-10-28 10:33:37 +03:00
Hongtao Yu 259e4c5658 [CSSPGO] Trim cold base profiles for the CS preinliner.
Adding support to the CS preinliner to trim cold base profiles. This makes trimming consistent with the inline decision made by the preinliner. Also disable the existing profile merger when preinliner is on unless explicitly specified.

Reviewed By: wenlei, wlei

Differential Revision: https://reviews.llvm.org/D112489
2021-10-27 22:50:27 -07:00
Hsiangkai Wang 7051f73d69 [RISCV] Sync Zvlsseg register order as the same as vector registers.
Sync the order of Zvlsseg registers with vector registers to avoid
unnecessary register copies between vector instructions and zvlsseg
instructions.

Differential Revision: https://reviews.llvm.org/D110250
2021-10-28 13:34:53 +08:00
Kazu Hirata cee3419d65 [AMDGPU] Remove unused declaration findNumUsedRegistersSI (NFC) 2021-10-27 21:24:02 -07:00
Phoebe Wang 2bc28c6f82 [X86] Add a dependency breaking xor before any gathers with an undef passthru value.
In the instruction encoding, the passthru register is always
tied to the destination register. The CPU scheduler has to wait
for the last writer of this register to finish executing before
the gather can start. This is true even if the initial mask is
all ones so that the passthru will never be used.

By explicitly zeroing the register we can break the false
dependency. The zero idiom is executed completing by the
register renamer and so is immedately considered ready.

Authored by Craig.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D112505
2021-10-28 11:44:52 +08:00
Hsiangkai Wang 0a9b82960c [RISCV] Use vmv.v.[v|i] if we know COPY is under the same vl and vtype.
If we know the source operand of COPY is defined by a vector instruction
with tail agnostic and the same LMUL and there is no vsetvli between
COPY and the define instruction to change the vl and vtype, we could use
vmv.v.v or vmv.v.i to copy vector registers to get better performance than
the whole vector register move instructions.

If the source of COPY is from vmv.v.i, we could use vmv.v.i for the
COPY.

This patch only considers all these instructions within one basic block.

Case 1:
```
bb.0:
  ...
  VSETVLI          # The first VSETVLI before COPY and VOP.
  ...              # Use this VSETVLI to check LMUL and tail agnostic.
  ...
  vy = VOP va, vb  # Define vy.
  ...              # There is no vsetvli between VOP and COPY.
  vx = COPY vy
```

Case 2:
```
bb.0:
  ...
  VSETVLI          # The first VSETVLI before VOP.
  ...              # Use this VSETVLI to check LMUL and tail agnostic.
  ...
  vy = VOP va, vb  # Define vy.
  ...              # There is no vsetvli to change vl between VOP and COPY.
  ...
  VSETVLI          # The first VSETVLI before COPY.
  ...              # This VSETVLI does not change vl and vtype.
  ...
  vx = COPY vy
```

Co-Authored-by: Zakk Chen <zakk.chen@sifive.com>
Co-Authored-by: Kito Cheng <kito.cheng@sifive.com>

Differential Revision: https://reviews.llvm.org/D103510
2021-10-28 11:39:04 +08:00
Max Kazantsev 513914e1f3 [SCEV] Invalidate user SCEVs along with operand SCEVs to avoid cache corruption
Following discussion in D110390, it seems that we are suffering from unability
to traverse users of a SCEV being invalidated. The result of that is that ScalarEvolution's
inner caches may store obsolete data about SCEVs even if their operands are
forgotten. It creates problems when we try to verify the contents of those caches.

It's also a frequent situation when messing with cache causes very sneaky and
hard-to-analyze bugs related to corruption of memory when dealing with cached
data. They are lurking there because ScalarEvolution's veirfication is not powerful
enough and misses many problematic cases. I plan to make SCEV's verification
much stricter in follow-ups, and this requires dangling-pointers-free caches.

This patch makes sure that, whenever we forget cached information for a SCEV,
we also forget it for all SCEVs that (transitively) use it.

This may have negative compile time impact. It's a sacrifice we are more
than willing to make to enforce correctness. We can also save some time by
reworking invokers of forgetMemoizedResults (maybe we can forget multiple
SCEVs with single query).

Differential Revision: https://reviews.llvm.org/D111533
Reviewed By: reames
2021-10-28 09:39:24 +07:00
Craig Topper 1387483e72 [RISCV] Replace most uses of RISCVSubtarget::hasStdExtV. NFCI
Add new hasVInstructions() which is currently equivalent.

Replace vector uses of hasStdExtZfh/F/D with new vector specific
versions. The vector spec no longer requires that the vectors implement the
same types as scalar. It only requires that the scalar type is
the maximum size the vectors can support. This is currently
implemented using the scalar rule we were using before.

Add new hasVInstructionsI64() begin using to qualify code that
requires i64 vector elements.

This is all NFC for now, but we can start using this to better
implement D112408 which introduces the Zve extensions.

Reviewed By: frasercrmck, eopXD

Differential Revision: https://reviews.llvm.org/D112496
2021-10-27 19:33:48 -07:00
Johannes Doerfert acf3093117 [Attributor][FIX] Do not ignore memory writes in AAMemoryBehavior
Even if we look for `nocapture` we need to bail on escaping pointers.
The crucial thing is that we might not look at a big enough scope when
we derive the memory behavior. Thus, it might be `nocapture` in a larger
context while it is "captured" in a smaller context.
2021-10-27 21:04:32 -05:00
Johannes Doerfert 734f91441d [Attributor][NFC] Improve debug messages 2021-10-27 21:04:31 -05:00
Ard Biesheuvel d7e089f2d6 [ARM] Use hardware TLS register in Thumb2 mode when -mtp=cp15 is passed
In ARM mode, passing -mtp=cp15 forces the use of an inline MRC system register read to move the thread pointer value into a register.

Currently, in Thumb2 mode, -mtp=cp15 is ignored, and a call to the __aeabi_read_tp helper is emitted instead.

This is inconsistent, and breaks the Linux/ARM build for Thumb2 targets, as the Linux kernel does not provide an implementation of __aeabi_read_tp,.

Reviewed By: nickdesaulniers, peter.smith

Differential Revision: https://reviews.llvm.org/D112600
2021-10-27 16:42:11 -07:00
Lang Hames 20675d8f7d Revert "[ORC] Change SPSExecutorAddr serialization, SupportFunctionCall struct."
This reverts commit e32b1eee6a.

Reverting while I fix some broken unit tests.
2021-10-27 16:39:56 -07:00
Johannes Doerfert 8a4551b893 [Attributor][FIX] Use right address space to avoid assertion
When we strip and accumulate constant offsets we need to pick the right
address space such that the offset APInt has the right bit width.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D112544
2021-10-27 18:22:37 -05:00
Lang Hames e32b1eee6a [ORC] Change SPSExecutorAddr serialization, SupportFunctionCall struct.
SPSExecutorAddr will now be serializable to/from ExecutorAddr, rather than
uint64_t. This improves type safety when working with serialized addresses.

Also updates the SupportFunctionCall to use an ExecutorAddrRange (rather than
a separate ExecutorAddr addr and uint64_t size field), and updates the
tpctypes::*Write data structures to use ExecutorAddr rather than
JITTargetAddress.
2021-10-27 16:20:46 -07:00
Roman Lebedev b291597112
Revert rest of `IRBuilderBase`'s short-circuiting folds
Upon further investigation and discussion,
this is actually the opposite direction from what we should be taking,
and this direction wouldn't solve the motivational problem anyway.

Additionally, some more (polly) tests have escaped being updated.
So, let's just take a step back here.

This reverts commit f3190dedee.
This reverts commit 749581d21f.
This reverts commit f3df87d57e.
This reverts commit ab1dbcecd6.
2021-10-28 02:15:14 +03:00
Michael Liao e6a4ba3aa6 [amdgpu] Handle the case where there is no scavenged register.
- When an unconditional branch is expanded into an indirect branch, if
  there is no scavenged register, an SGPR pair needs spilling to enable
  the destination PC calculation. In addition, before jumping into the
  destination, that clobbered SGPR pair need restoring.
- As SGPR cannot be spilled to or restored from memory directly, the
  spilling/restoring of that SGPR pair reuses the regular SGPR spilling
  support but without spilling it into memory. As that spilling and
  restoring points are fully controlled, we only need to spill that SGPR
  into the temporary VGPR, which needs spilling into its emergency slot.
- The target-specific hook is revised to take additional restore block,
  where the restoring code is filled. After that, the relaxation will
  place that restore block directly before the destination block and
  insert an unconditional branch in any fall-through block into the
  destination block.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D106449
2021-10-27 18:37:27 -04:00
Ben Langmuir 3d13ee2891 [ORC][ORC-RT] Enable the MachO platform for arm64
Enables the arm64 MachO platform, adds basic tests, and implements the
missing TLV relocations and runtime wrapper function. The TLV
relocations are just handled as GOT accesses.

rdar://84671534

Differential Revision: https://reviews.llvm.org/D112656
2021-10-27 13:36:03 -07:00
Nikita Popov ea7be26045 [ConstantRange] Optimize smul_sat() (NFC)
Base the implementation on the APInt smul_sat() implementation,
which is much more efficient than performing calculations in
double the bitwidth.
2021-10-27 21:01:09 +02:00
Nikita Popov 665060ea45 [BasicAA] Remove misleading overflow check
GEP decomposition currently checks whether the multiplication of
the linear expression offset and GEP scale overflows. However, if
everything else works correctly, this overflow check is both
unnecessary and dangerously misleading. While it will avoid an
overflow in Scale * Offset in particular, other parts of the
calculation (including those on dynamic values) may still overflow.
The code working on the decomposed GEPs is responsible for ensuring
that it remains correct in the presence of overflow. D112611 fixes
the last issue of that kind that I'm aware of (in fact, the overflow
check was originally introduced to work around precisely that issue).

Differential Revision: https://reviews.llvm.org/D112618
2021-10-27 20:56:03 +02:00
Nick Desaulniers 3ccd041af9 [LowerTypeTests] Emit cfi_jt aliases regardless of function export
A constant complaint we get is that the __typeid__ symbols in the CFI
jump tables causes confusing stack traces in applications. Emit the more
readable cfi_jt aliases regardless of function export (LTO vs Thin LTO).

Reviewed By: pcc, tejohnson

Differential Revision: https://reviews.llvm.org/D107934
2021-10-27 11:36:26 -07:00
Philip Reames 425cbbc602 [Operator] Add hasPoisonGeneratingFlags [mostly NFC]
This method parallels the dropPoisonGeneratingFlags on Instruction, but is hoisted to operator to handle constant expressions as well.

This is mostly code movement, but I did go ahead and add the inrange constexpr gep case.  This had been discussed previously, but apparently never followed up o.
2021-10-27 11:25:40 -07:00
Alexey Bataev f06e332982 Revert "[SLP]Improve/fix reordering of the gathered graph nodes."
This reverts commit 64d1617d18 to fix test
non-stability.
2021-10-27 11:16:58 -07:00
Roman Lebedev 156f10c840
[IR] `SCEVExpander::generateOverflowCheck()`: short-circuit `umul_with_overflow`-by-one
It's a no-op, no overflow happens ever: https://alive2.llvm.org/ce/z/Zw89rZ

While generally i don't like such hacks,
we have a very good reason to do this: here we are expanding
a run-time correctness check for the vectorization,
and said `umul_with_overflow` will not be optimized out
before we query the cost of the checks we've generated.

Which means, the cost of run-time checks would be artificially inflated,
and after https://reviews.llvm.org/D109368 that will affect
the minimal trip count for which these checks are even evaluated.
And if they aren't even evaluated, then the vectorized code
certainly won't be run.

We could consider doing this in IRBuilder,  but then we'd need to
also teach `CreateExtractValue()` to look into chain of `insertvalue`'s,
and i'm not sure there's precedent for that.

Refs. https://reviews.llvm.org/D109368#3089809
2021-10-27 19:45:55 +03:00
Kazu Hirata 593451bd3c [X86] Remove getSETOpc (NFC)
This function seems to be unused for at least one year.
2021-10-27 09:22:31 -07:00
Kazu Hirata e6b6190ead [X86] Remove NeedsRetpoline in X86AsmPrinter (NFC)
This field seems to be unused for at least one year.
2021-10-27 09:22:29 -07:00
Kazu Hirata cc73310a81 [X86] Remove CallOperand in X86Operand (NFC)
This field seems to be unused for at least one year.
2021-10-27 09:22:27 -07:00
Alexey Bataev 64d1617d18 [SLP]Improve/fix reordering of the gathered graph nodes.
Gathered loads/extractelements/extractvalue instructions should be
checked if they can represent a vector reordering node too and their
order should ve taken into account for better graph reordering analysis/
Also, if the gather node has reused scalars, they must be reordered
instead of the scalars themselves.

Differential Revision: https://reviews.llvm.org/D112454
2021-10-27 08:49:13 -07:00
David Sherwood 5d9318638e [NFC][LoopVectorize] Change getStepVector to take a Value* for the StartIdx
This patch changes the definition of getStepVector from:

  Value *getStepVector(Value *Val, int StartIdx, Value *Step, ...

to

  Value *getStepVector(Value *Val, Value *StartIdx, Value *Step, ...

because:

1. it seems inconsistent to pass some values as Value* and some as
   integer, and
2. future work will require the StartIdx to be an expression made up
   of runtime calculations of the VF.

In widenIntOrFpInduction I've changed the code to pass in the
value returned from getRuntimeVF, but the presence of the assert:

  assert(!VF.isScalable() && "scalable vectors not yet supported.");

means that currently this code path is only exercised for fixed-width
VFs and so the patch is still NFC.

Differential revision: https://reviews.llvm.org/D111882
2021-10-27 16:12:38 +01:00
Roman Lebedev ab1dbcecd6
[IR] `IRBuilderBase::CreateSelect()`: if cond is a constant i1, short-circuit
While we could emit such a tautological `select`,
it will stick around until the next instsimplify invocation,
which may happen after we count the cost of this redundant `select`.
Which is precisely what happens with loop vectorization legality checks,
and that artificially increases the cost of said checks,
which is bad.

There is prior art for this in `IRBuilderBase::CreateAnd()`/`IRBuilderBase::CreateOr()`.

Refs. https://reviews.llvm.org/D109368#3089809
2021-10-27 18:01:05 +03:00
Alexey Bataev 9b12975cbf Revert "[SLP]Improve/fix reordering of the gathered graph nodes."
This reverts commit f719b794bc to fix
instability in tests.
2021-10-27 07:31:36 -07:00
Kerry McLaughlin f01fafdcd4 [SVE][CodeGen] Fix incorrect legalisation of zero-extended masked loads
PromoteIntRes_MLOAD always sets the extension type to `EXTLOAD`, which
results in a sign-extended load. If the type returned by getExtensionType()
for the load being promoted is something other than `NON_EXTLOAD`, we
should instead pass this to getMaskedLoad() as the extension type.

Reviewed By: CarolineConcatto

Differential Revision: https://reviews.llvm.org/D112320
2021-10-27 14:15:41 +01:00
Alexey Bataev f719b794bc [SLP]Improve/fix reordering of the gathered graph nodes.
Gathered loads/extractelements/extractvalue instructions should be
checked if they can represent a vector reordering node too and their
order should ve taken into account for better graph reordering analysis/
Also, if the gather node has reused scalars, they must be reordered
instead of the scalars themselves.

Differential Revision: https://reviews.llvm.org/D112454
2021-10-27 06:08:40 -07:00
Caroline Concatto 1137b7207d [SelectionDAG] Widening the result of INSERT_SUBVECTOR.
Widens the result and first input vector because they have the same size.
The subvector to be inserted is widened in the operand widen function.

Differential Revision: https://reviews.llvm.org/D112187
2021-10-27 13:52:25 +01:00
Nikita Popov fbc0c308d5 [BasicAA] Handle known bits as ranges
BasicAA currently tries to determine that the offset is positive by
checking whether all variable indices are positive based on known
bits, multiplied by a positive scale. However, this is incorrect
if the scale multiplication might overflow. In the modified test
case the original value is positive, but may be negative after a
left shift.

Fix this by converting known bits into a constant range and reusing
the range-based logic, which handles overflow correctly.

Differential Revision: https://reviews.llvm.org/D112611
2021-10-27 14:41:31 +02:00
Daniel Kiss 894ddba1c9 Revert "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume."
This reverts commit da1d1a0869.
2021-10-27 14:29:35 +02:00
Sanjay Patel 6c0a2c2804 [x86] enhance mayFoldLoad to check alignment
As noted in D112464, a pre-AVX target may not be able to fold an
under-aligned vector load into another op, so we shouldn't report
that as a load folding candidate. I only found one caller where
this would make a difference -- combineCommutableSHUFP() -- so
that's where I added a test to show the (minor) regression.

Differential Revision: https://reviews.llvm.org/D112545
2021-10-27 07:54:25 -04:00
Matt fc28a2f8ce [AArch64][SVE] Combine predicated FMUL/FADD into FMA
Combine FADD and FMUL intrinsics into FMA when the result of the FMUL is an FADD operand
with one only use and both use the same predicate.

Differential Revision: https://reviews.llvm.org/D111638
2021-10-27 11:41:23 +00:00
Alexandros Lamprineas 8689f5e6e7 [AArch64] Add support for the 'R' architecture profile.
This change introduces subtarget features to predicate certain
instructions and system registers that are available only on
'A' profile targets. Those features are not present when
targeting a generic CPU, which is the default processor.

In other words the generic CPU now means the intersection of
'A' and 'R' profiles. To maintain backwards compatibility we
enable the features that correspond to -march=armv8-a when the
architecture is not explicitly specified on the command line.

References: https://developer.arm.com/documentation/ddi0600/latest

Differential Revision: https://reviews.llvm.org/D110065
2021-10-27 12:32:30 +01:00
Alexey Bataev cb4feae7bd [SLP]Fix logical and/or reductions.
Need to emit select(cmp) instructions for poison-safe forms of select
ops. Currently alive reports that `Target is more poisonous than source`
for operations we generating for such instructions.

https://alive2.llvm.org/ce/z/FiNiAA

Differential Revision: https://reviews.llvm.org/D112562
2021-10-27 04:25:20 -07:00
Nikita Popov 9bc7e543b4 [BasicAA] Make range check more precise
Make the range check more precise by calculating the range of
potentially accessed bytes for both accesses and checking whether
their intersection is empty. In that case there can be no overlap
between the accesses and the result is NoAlias.

This is more powerful than the previous approach, because it can
deal with sign-wrapped ranges. In the test case the original range
is [-1, INT_MAX] but becomes [0, INT_MIN] after applying the offset.
This is a wrapping range, so getSignedMin/getSignedMax will treat
it as a full range. However, the range excludes the elements
[INT_MIN+1, -1], which is enough to prove NoAlias with an access
at offset -1.

Differential Revision: https://reviews.llvm.org/D112486
2021-10-27 12:40:58 +02:00
Jay Foad b9e3af124b [LiveInterval] Add RemoveDeadValNo argument to removeSegment(iterator)
Add an optional bool RemoveDeadValNo argument to the
removeSegment(iterator) overload, for consistency with the other
overloads. This gives clients a way to remove dead valnos while also
getting an updated iterator returned (in the manner of vector::erase).

Use this to clean up some inefficient code in
LiveIntervals::repairOldRegInRange. NFC.

Differential Revision: https://reviews.llvm.org/D110560
2021-10-27 09:43:32 +01:00
Daniel Kiss da1d1a0869 [ARM] __cxa_end_cleanup should be called instead of _UnwindResume.
ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup.
It will call the UnwindResume.
__cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called.
This will trigger a termination when a foreign exception is processed while UnwindResume is called
because the global state will be wrong due to the missing __cxa_end_cleanup call.

Additional test here: D109856
[1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions

Reviewed By: logan

Differential Revision: https://reviews.llvm.org/D111703
2021-10-27 10:40:00 +02:00
David Sherwood 3d706c20f8 [NFC][LoopVectorize] Remove setBestPlan in favour of getBestPlanFor
I have removed LoopVectorizationPlanner::setBestPlan, since this
function is quite aggressive because it deletes all other plans
except the one containing the <VF,UF> pair required. The code is
currently written to assume that all <VF,UF> pairs will live in the
same vplan. This is overly restrictive, since scalable VFs live in
different plans to fixed-width VFS. When we add support for
vectorising epilogue loops when the main loop uses scalable vectors
then we will the vplan for the main loop will be different to the
epilogue.

Instead I have added a new function called

  LoopVectorizationPlanner::getBestPlanFor

that returns the best vplan for the <VF,UF> pair requested and leaves
all the vplans untouched. We then pass this best vplan to

  LoopVectorizationPlanner::executePlan

which now takes an additional VPlanPtr argument.

Differential revision: https://reviews.llvm.org/D111125
2021-10-27 09:38:27 +01:00
Arthur Eubanks ae27c57b18 [InferAddressSpaces] Make pass work with opaque pointers
Avoid getPointerElementType().
2021-10-26 23:53:20 -07:00
Kazu Hirata 6af3e87d2d [Hexagon] Remove set-but-unused variables (NFC) 2021-10-26 23:38:15 -07:00
Phoebe Wang eb55c1f153 [X86][NFC] Add the missed `break;` for 79f9dfef0d 2021-10-27 13:58:31 +08:00
Craig Topper 2783a5cfaf [RISCV] Add ICmp and FCmp to shouldSinkOperands. 2021-10-26 22:23:54 -07:00
Lang Hames 91434d4469 [JITLink] Fix element-present check in MachOLinkGraphParser.
Not all symbols are added to the index-to-symbol map, so we shouldn't use the
size of the map as a proxy for the highest valid index.
2021-10-26 20:48:40 -07:00
Lang Hames bfb40e83ee [ORC] Don't try to perform empty deallocations. 2021-10-26 20:48:40 -07:00
Max Kazantsev 5961f0308f [SCEV][NFC] Verify intergity of SCEVUsers
Make sure that, for every living SCEV, we have all its direct
operand tracking it as their user.

Differential Revision: https://reviews.llvm.org/D112402
Reviewed By: reames
2021-10-27 09:54:49 +07:00
Ben Shi 97e52e1c35 [RISCV] Optimize immediate materialisation with SLLI.UW in the Zba extension
Simplify "LUI+SLLI+ADDI+SLLI" and "LUI+ADDIW+SLLI+ADDI+SLLI" to
"LUI+ADDIW+SLLIUW" to reduce total instruction amount.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D111933
2021-10-27 02:48:38 +00:00
David Blaikie 3ac709b6ce llvm-dwarfdump --verify: Exit non-zero on simplified template name rebuilding failures 2021-10-26 15:57:16 -07:00
Austin Kerbow 02e60f2e77 [AMDGPU] Use max waves for scheduler's initial occupancy target
The scheduler should set critical/excess register usage thresholds that
are guided by the maximum possible occupancy for the function. This
change is focused on setting proper lower bounds on register usage which
we would typically only see when a specific number of maximum waves is
requested with the "waves-per-eu" attribute, or by setting
"amdgpu-num-vgpr|sgpr" directly. This was broken previously. I have a
follow-on patch that will address issues with the scheduler not
targeting correct upper bounds on register usage which is typical with
launch bounds and min "waves-per-eu".

Changes by this patch:

Set the initial critical register usage thresholds to minimum values
that are determined by the maximum possible occupancy for the function,
or the number of allocatable registers, whichever is lower.

Avoid unisgned overflow if register limits are lower than the register
tracking "ErrorMargin", I.e. when using stress-regalloc=2.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D112373
2021-10-26 15:30:26 -07:00
Yuanfang Chen 7c3fa52785 [DebugInfo] Skip ODRUniquing for mismatched tags
Otherwise, ODRUniquing would map some member method/variable MDNodes
to have enum type DIScope, resulting in invalid debug info and bad
DWARF.

- Add a Verifier check that when a 'scope:' operand is an ODR type that is not an enum.
- Makes ODRUniquing apply to only ODR types with the same tag so that the debuginfo/DWARF is well-formed.

Reviewed By: probinson, aprantl

Differential Revision: https://reviews.llvm.org/D111770
2021-10-26 15:28:25 -07:00
Sanjay Patel acabad9ff6 [InstCombine] try to canonicalize icmp with trunc op into mask and cmp
The motivating test is based on:
https://llvm.org/PR52260

We have better analysis for X == 0, so try harder to form that.
2021-10-26 17:43:28 -04:00
Fangrui Song 226465efe3 [ARC] Fix `undefined symbol: llvm::MachineFunction::dump() const` 2021-10-26 11:44:18 -07:00
Usman Nadeem da1318ccca [NFC][Instcombine] Cleanup some obsolete matches in visitSelectInstr
These are now redundant after https://reviews.llvm.org/D106872

Change-Id: I82edfedf1d45cac4e3368d77ce3a48c78e342c19
2021-10-26 10:07:08 -07:00
Rosie Sumpter b716d0aa94 [LoopVectorize] Clean up VPReductionRecipe::execute. NFC
Use RdxDesc->getOpcode instead of getUnderlingInstr()->getOpcode.
Move the code which finds Kind and IsOrdered to be outside the for loop
since neither of these change with the vector part.

Differential Revision: https://reviews.llvm.org/D112547
2021-10-26 17:18:25 +01:00
Kazu Hirata c3e698e2f5 [CodeGen, Hexagon] Use MachineBasicBlock::phis (NFC) 2021-10-26 09:01:29 -07:00
Jonas Paulsson bb506938be [SystemZ] Improvement of emitMemMemWrapper()
It was discovered that an extra register COPY remained when expanding a
(variable length) memory operation with a loop and there was another use of
the involved address register(s) afterwards.

A simple fix for this is to COPY the address registers before the loop and
use that new vreg instead.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D112065
2021-10-26 17:03:01 +02:00
Alexey Bataev ce14d1b690 [SLP]Do not reorder reduction nodes.
The final reduction nodes should not be reordered, the order does not
matter for reductions. Also, it might be profitable to vectorize smaller
reduction trees, reduction cost may compensate small tree cost.

Part of D111574

Differential Revision: https://reviews.llvm.org/D112467
2021-10-26 07:41:24 -07:00
zhijian 158083f0de [AIX][XCOFF] parsing xcoff object file auxiliary header
Summary:

The patch supports parsing the xcoff object file auxiliary header with llvm-readobj with option "auxiliary-headers"

the format of auxiliary header as
https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/filesreference/XCOFF.html#XCOFF__fyovh386shar

Reviewers: James Henderson, Jason Liu, Hubert Tong, Esme yi, Sean Fertile.

Differential Revision: https://reviews.llvm.org/D82549
2021-10-26 10:40:25 -04:00
Neubauer, Sebastian eb16570ab0 [AMDGPU] Remove unused CSR defs
CSR_AMDGPU_VGPRs_24_255 and CSR_AMDGPU_VGPRs_32_255 are not used
anywhere, so remove them.

Differential Revision: https://reviews.llvm.org/D112535
2021-10-26 16:01:49 +02:00
Abinav Puthan Purayil 61e3b9fefe [AMDGPU] Add constrained shift pattern matches.
The motivation for this is due to clang's conformance to
https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_C.html#operators-shift
which makes clang emit (<shift> a, (and b, <width> - 1)) for `a <shift> b`
in OpenCL where a is an int of bit width <width>.

Differential revision: https://reviews.llvm.org/D110231
2021-10-26 19:07:19 +05:30
Chen Zheng 631f44f338 [PowerPC] use right extend type for SCEV
Fix an issue caused by D108750

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D112502
2021-10-26 13:32:03 +00:00
Abinav Puthan Purayil 781dd39b7b [AMDGPU] Enable 48-bit mul in AMDGPUCodeGenPrepare.
We were bailing out of creating 24-bit muls for results wider than 32
bits in AMDGPUCodeGenPrepare. With the 24-bit mulhi intrinsic, this
change teaches AMDGPUCodeGenPrepare to generate the 48-bit mul
correctly.

Differential Revision: https://reviews.llvm.org/D112395
2021-10-26 18:53:07 +05:30
Abinav Puthan Purayil 9bd5cfeb1f [AMDGPU] Implement llvm.amdgcn.mulhi.[i,u]24 intrinsics.
These intrinsics maps to the 24-bit v_mul_hi instructions.

This change also fixes an incorrect assumption on the associativity of
24-bit mulhi in its SDNode record in tblgen.

Differential Revision: https://reviews.llvm.org/D112394
2021-10-26 18:53:07 +05:30
Sanjay Patel 2ab0148c14 [x86] use cast instead of dyn_cast for unchecked usage; NFC
This was noted as an independent clean-up in D112464.
2021-10-26 08:20:19 -04:00
Neubauer, Sebastian 487f15603e [AMDGPU] Fix setcc combine for i128
The combine asserted if constants could not be represented as uint64_t.
Use APInts to fix this.

Differential Revision: https://reviews.llvm.org/D112416
2021-10-26 13:39:50 +02:00
Jay Foad c8e5aef1a0 [AMDGPU] Use standard MachineBasicBlock::getFallThrough method. NFCI.
Differential Revision: https://reviews.llvm.org/D101825
2021-10-26 12:07:54 +01:00
Jonas Paulsson 9f8872779a [SystemZ] Provide size values for PATCHPOINT, STACKMAP and FENTRY_CALL.
All instructions must have a correct size value close to emission when
SystemZLongBranch runs, or a necessary branch relaxation may be missed.

This patch also adds an assert for instruction sizes in SystemZLongBranch.

Review: Ulrich Weigand
2021-10-26 12:07:22 +02:00
Nikita Popov 11a8423dab [SCEV] Use reverse() (NFC) 2021-10-26 11:08:58 +02:00
Max Kazantsev 9bbfe0f72c [NFC] Remove obsolete simplifyOnceImpl function
The function simplifyOnce only calls simplifyOnceImpl and does nothing else.
Having this separate helper makes no sense. Removing it.

Patch by Dmitry Bakunevich!

Differential Revision: https://reviews.llvm.org/D112517
Reviewed By: mkazantsev
2021-10-26 13:51:42 +07:00
Max Kazantsev d4c74cd4e8 [NFC] [LoopPeel] Update IDoms of non-loop blocks dominated by the loop
When peeling a loop, we assume that the latch has a `br` terminator and that
all loop exits are either terminated with an `unreachable` or have a terminating
deoptimize call. So when we peel off the 1st iteration, we change the IDom of
all loop exits to the peeled copy of `NCD(IDom(Exit), Latch)`. This works now,
but if we add logic to support loops with exits that are followed by a block
with an `unreachable` or a terminating deoptimize call, changing the exit's idom
wouldn't be enough and DT would be broken.

For example, let `Exit1` and `Exit2` are loop exits, and each of them
unconditionally branches to the same `unreachable` terminated block. So neither
of the exits dominates this unreachable block. If we change the IDoms of the
exits to some peeled loop block, we don't update the dominators of the unreachable
block. Currently we just don't get to the peeling logic, saying that we can't peel
such loops.

Previously we stored exits' IDoms in a map before peeling a loop and then, after
peeling off one iteration, we changed their IDoms.
Now we use the same logic not only for exits but for all non-loop blocks dominated
by the loop.
So when we add logic to support peeling loops with exits which branch, for example,
to an unreachable-terminated block, we would update the IDoms not only for exits,
but for their successors.

Patch by Dmitry Makogon!

Differential Revision: https://reviews.llvm.org/D111611
Reviewed By: mkazantsev, nikic
2021-10-26 13:09:07 +07:00
Phoebe Wang 79f9dfef0d [X86] Move splat addends from the gather/scatter index operand to the base address
This can avoid a vector add and a constant pool load. Or an explicit broadcast in case of non-constant.

Also reverse the transform any time we encounter a constant index addend that can't be moved to base. In that case pull the constant from base into the index. This reduces code size needed for the displacement since we needed the index add anyway. Limit this to scale of 1 to avoid divisibility and wrap issues.

Authored by Craig.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D111595
2021-10-26 12:35:57 +08:00
Duncan P. N. Exon Smith b12a864c29 Bitcode: Use Expected<T>::takeError() and moveInto() more, NFC
Avoid naming some Expected<T> values in the Bitcode reader by using
takeError() and moveInto() more often. This follows the smaller set of
changes included in 2410fb4616.
2021-10-25 16:03:40 -07:00
Zarko Todorovski e9163660b1 [PPC][LLVM] Inclusive terms: remove references to sanity check in lib/Target/PowerPC
Removed references to `sanity check` in `PPCBranchCoalescing.cpp` code comments.
No word substitution made in this case, as the comments and code following illustrated are
sufficient IMO.

Reviewed By: quinnp

Differential Revision: https://reviews.llvm.org/D112452
2021-10-25 18:13:54 -04:00
Craig Topper d51e3a2139 [LegalizeTypes][TargetLowering] Merge getShiftAmountTyForConstant into TargetLowering::getShiftAmountTy.
getShiftAmountTyForConstant is a special helper that changes the
shift amount to i32 if the type chosen by
TargetLowering::getShiftAmountTy can't represent all possible values.
This is needed to satisfy an assert in SelectionDAG::getNode.

It requires additional consideration to know when this helper should be used.
I'm not sure that we are always using it when we should.

This patch merges the getShiftAmountTyForConstant handling into
TargetLowering::getShiftAmountTy so we don't need to think about it
anymore.

Technically this may slightly increase compile times since the majority
of callers of getShiftAmountTy won't need this. Hopefully, this isn't
an issue in practice.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112469
2021-10-25 14:06:53 -07:00
Nikita Popov 3a995c918e [SCEV] Move SCEVLostPoisonFlags() check into SCEVExpander
Always insert values into ExprValueMap, and instead skip using them
in SCEVExpander if poison-generating flags have been lost. This
ensures that all values that are in ValueExprMap are also in
ExprValueMap, so we can use the latter to invalidate the former.

This change is probably not entirely NFC for the case where
originally the SCEV had no nowrap flags but they were inferred
later, in which case that would now allow reusing the existing
value for expansion.

Differential Revision: https://reviews.llvm.org/D112389
2021-10-25 22:37:20 +02:00
Arthur Eubanks 4a9db7367d [AlwaysInliner] Invalidate analyses when we delete functions
Fixes PR52292.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D112473
2021-10-25 13:36:32 -07:00
Zarko Todorovski 9769e97c35 [LLVM] Inclusive terms: remove/replace references to sanity in RewriteStatepointsForGC.cpp and test
Part of work to have the LLVM backend to use more inclusive terms.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D112461
2021-10-25 16:17:41 -04:00
Stefan Gränitz cdb335ffaf [JITLink] Fix warning 'shift count exceeds width' in AArch64 backend 2021-10-25 20:44:07 +02:00
Jeremy Morse 4136897bd4 [DebugInfo][InstrRef][NFC] Switch to using DenseMaps and similar
There are a few STL containers hanging around that can become DenseMaps,
SmallVectors and similar. This recovers a modest amount of compile time
performance.

While I'm here, adjust the bit layout of ValueIDNum: this was always
supposed to act like a value type, however it seems that clang doesn't
compile the comparison functions to act that way. Add a uint64_t to a
union that explicitly aliases the bitfields, so that we can compare the
whole value as a single integer.

Differential Revision: https://reviews.llvm.org/D112333
2021-10-25 18:07:17 +01:00
Wouter van Oortmerssen 5694dbccc3 [WebAssembly] support Memory64 in target_features section
Differential Revision: https://reviews.llvm.org/D112266
2021-10-25 09:31:45 -07:00
Jeremy Morse 97ddf49e43 [DebugInfo][InstrRef] Recover stack-slot tracking performance
This patch is like D111627 -- instead of calculating IDF for every location
on the stack, only do it for the smallest units of interference, and copy
the PHIs for those units to any aliases.

The test added runs placeMLocPHIs directly, and tests that:
 * A def of the lower 8 bits of a stack slot causes all aliasing regs to
   have PHIs placed,
 * It doesn't cause the equivalent location to x86's $ah, which isn't
   aliased, to have a PHI placed.

Differential Revision: https://reviews.llvm.org/D112324
2021-10-25 17:31:09 +01:00
Philip Reames f82cf6187f [indvars] Fix pr52276 (missing one use check)
The recently added logic to canonicalize exit conditions to unsigned relies on facts which hold about the use (i.e. exit test).  Applying this blindly to the icmp is not legal, as there may be another use which never reaches the exit.  Restrict ourselves to case where we have a single use.
2021-10-25 09:26:55 -07:00
Craig Topper e2b7aabb57 [RISCV] Reduce the number of RISCV vector builtins by an order of magnitude.
All but 2 of the vector builtins are only used by clang_builtin_alias.
When using clang_builtin_alias, the type string of the builtin is never
checked. Only the types in the function definition used for the alias
are checked.

This patch takes advantage of this to share a single builtin for
many different types. We already used type overloads on the IR intrinsic
so the codegen for the builtins that are being merge were already
the same. This extends the type overloading to the builtins.

I had to make a few tweaks to make this work.
-Floating point vector-vector vmerge now uses the vmerge intrinsic
 instead of the vfmerge intrinsic. New isel patterns and tests are
 added to support this.
-The SemaChecking for the immediate of vset_v/vget_v has been removed.
 Determining the valid range is harder now. I've added masking to
 ManualCodegen to ensure valid IR for invalid input.

This reduces the number of builtins from ~25000 to ~1100.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D112102
2021-10-25 09:03:59 -07:00
Craig Topper 210b586a85 [RISCV] Add vcsr CSR name for V extension.
Reviewed By: frasercrmck, kito-cheng

Differential Revision: https://reviews.llvm.org/D112342
2021-10-25 08:56:25 -07:00
Danila Malyutin 7b102fcc91 [CodeGen] Fix dependence breaking for tied operands
Differential Revision: https://reviews.llvm.org/D107582
2021-10-25 18:52:27 +03:00
Jeremy Morse ee3eee71e4 [DebugInfo][InstrRef] Track values fused into stack spills
During register allocation, some instructions can have stack spills fused
into them. It means that when vregs are allocated on the stack we can
convert:

    SETCCr %0
    DBG_VALUE %0

to

    SETCCm %stack.0
    DBG_VALUE %stack.0

Unfortunately instruction referencing finds this harder: a store to the
stack doesn't have a specific operand number, therefore we don't substitute
the old operand for a new operand, and the location is dropped. This patch
implements a solution: just recognise the memory operand attached to an
instruction with a Special Number (TM), and record a substitution between
the old value and the new one.

This patch adds substitution code to InlineSpiller to record such fused
spills, and tracking in InstrRefBasedLDV to recognise such values, and
produce the value numbers for them. Everything to do with the movement of
stack-defined values is already handled in InstrRefBasedLDV.

Differential Revision: https://reviews.llvm.org/D111317
2021-10-25 15:14:53 +01:00
Danila Malyutin 2d9ee590b6 [AArch64] Handle ST1iN instructions in isAArch64FrameOffsetLegal
Before the code would crash with "unhandled opcode in
isAArch64FrameOffsetLegal" when there was a spill from extractelement.
Fixes pr52249

Differential Revision: https://reviews.llvm.org/D112311
2021-10-25 17:05:12 +03:00
Nikita Popov 0d20ebf686 [BasicAA] Use ranges for more than one index
D109746 made BasicAA use range information to determine the
minimum/maximum GEP offset. However, it was limited to the case of
a single variable index. This patch extends support to multiple
indices by adding all the ranges together.

Differential Revision: https://reviews.llvm.org/D112378
2021-10-25 15:30:50 +02:00
Alexey Bataev eb9b75dd4d [SLP]Change the order of the reduction/binops args pair vectorization attempts.
Need to change the order of the reduction/binops args pair vectorization
attempts. Need to try to find the reduction at first and postpone
vectorization of binops args. This may help to find more reduction
patterns and vectorize them.
Part of D111574.

Differential Revision: https://reviews.llvm.org/D112224
2021-10-25 06:27:14 -07:00
Jeremy Morse 2eb96e1711 [DebugInfo][NFC] Avoid a use-after-free
This patch swaps two lines -- the CurSucc reference can be invalidated
by the call to DFS.push_back, therefore that should happen last. The
usual hat-tip to asan for catching this.

This patch also swaps an ealier call to ToAdd.insert and DFS.push_back,
where a stable iterator (from successors()) is being used. This isn't
strictly necessary, but is good for consistency and avoiding readers
asking themselves why the two code portions have a different order.
2021-10-25 14:16:30 +01:00
Sanjay Patel 6e46b66e2a [DAGCombiner] make matching bit-hack form of usubsat more flexible
(i8 X ^ 128) & (i8 X s>> 7) --> usubsat X, 128

As suggested in D112085, we can substitute 'xor' with 'add'
in this pattern, and it is logically equivalent:
https://alive2.llvm.org/ce/z/eJtWWC

We canonicalize to 'xor' in IR, but SDAG does not do that
(and it probably should not - https://llvm.org/PR52267 ), so
it is possible to see either pattern in codegen. Note that
'sub' is a another potential pattern, but that is
canonicalized to 'add' in DAGCombiner, so we don't need to
worry about that variation.

Differential Revision: https://reviews.llvm.org/D112377
2021-10-25 09:01:52 -04:00
Tim Northover f9089accba CodeGenPrep: remove all copies of GEP from list if there are duplicates.
Unfortunately ToT has changed enough from the revision where this actually
caused problems that the test no longer triggers an assertion failure.
2021-10-25 14:00:02 +01:00
Kerry McLaughlin 1f49b71fe5 [SVE][CodeGen] Enable reciprocal estimates for scalable fdiv/fsqrt
This patch enables the use of reciprocal estimates for SVE
when both the -Ofast and -mrecip flags are used.

Reviewed By: david-arm, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D111657
2021-10-25 11:30:44 +01:00
Max Kazantsev a9b0776a81 [SimplifyCFG] Sanity assert in iterativelySimplifyCFG
We observe a hang within iterativelySimplifyCFG due to infinite
loop execution. Currently, there is no limit to this loop, so
in case of bug it just works forever. This patch adds an assert
that will break it after 1000 iterations if it didn't converge.
2021-10-25 17:10:17 +07:00
Nikita Popov 75384ecdf8 [InstSimplify] Refactor invariant.group load folding
Currently strip.invariant/launder.invariant are handled by
constructing constant expressions with the intrinsics skipped.
This takes an alternative approach of accumulating the offset
using stripAndAccumulateConstantOffsets(), with a flag to look
through invariant.group intrinsics.

Differential Revision: https://reviews.llvm.org/D112382
2021-10-25 10:56:25 +02:00
Florian Hahn a6c4969f5f
[VPlan] Do not create dummy entry block (NFC).
At the moment a dummy entry block is created at the beginning of VPlan
construction. This dummy block is later removed again.

This means it is not easy to identify the VPlan header block in a
general fashion, because during recipe creation it is the single
successor of the entry block, while later it is the entry block.

To make getting the header easier, just skip creating the dummy block.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D111299
2021-10-25 09:52:58 +01:00
Jingu Kang a502436259 [AArch64] Remove redundant ORRWrs which is generated by zero-extend
%3:gpr32 = ORRWrs $wzr, %2, 0
%4:gpr64 = SUBREG_TO_REG 0, %3, %subreg.sub_32

If AArch64's 32-bit form of instruction defines the source operand of ORRWrs,
we can remove the ORRWrs because the upper 32 bits of the source operand are
set to zero.

Differential Revision: https://reviews.llvm.org/D110841
2021-10-25 09:47:07 +01:00
Nikita Popov 477551fd09 [SCEVExpander] Minor cleanup in value reuse (NFC)
Use dyn_cast_or_null and convert one of the checks into an
assertion. SCEV is a per-function analysis.
2021-10-25 10:32:17 +02:00
Kazu Hirata 3729a5abf4 [SCEV] Fix a warning on an unused lambda capture
This patch fixes:

  llvm/lib/Analysis/ScalarEvolution.cpp:12770:37: error: lambda
  capture 'this' is not used [-Werror,-Wunused-lambda-capture]
2021-10-25 00:45:18 -07:00
Max Kazantsev f8623b0783 [SCEV][NFC] Win some compile time from mass forgetMemoizedResults
Mass forgetMemoizedResults can be done more efficiently than bunch
of individual invocations of helper because we can traverse maps being
updated just once, rather than doing this for each invidivual SCEV.

Should be NFC and supposedly improves compile time.

Differential Revision: https://reviews.llvm.org/D112294
Reviewed By: reames
2021-10-25 14:09:41 +07:00
Max Kazantsev dbab339ea4 [SCEV][NFC] Apply mass forgetMemoizedResults queries where possible
When forgetting multiple SCEVs, rather than doing this one by one, we can
instead use mass updates. We plan to make them more efficient than they
are now, potentially improving compile time.

Differential Revision: https://reviews.llvm.org/D111602
Reviewed By: reames
2021-10-25 13:50:49 +07:00
Max Kazantsev a6096b7f9e [SCEV][NFC] Introduce API for mass forgetMemoizedResults query
This patch changes signature of forgetMemoizedResults to be able to work with
multiple SCEVs. Usage will come in follow-ups. We also plan to optimize it in the
future to work faster than individual invalidation updates. Should not change
behavior in any sense.

Split-off from D111602.

Differential Revision: https://reviews.llvm.org/D112293
Reviewed By: reames
2021-10-25 13:49:31 +07:00
Max Kazantsev 1c18ebb2cc [NFC][SCEV] Do not track users of SCEVConstants
Follow-up from D112295, suggested by Nikita: we can avoid tracking
users of SCEVConstants because dropping their cached info is unlikely
to give any new prospects for fact inference, and it should not introduce
any correctness problems.
2021-10-25 12:30:46 +07:00
Max Kazantsev fea4a48c0b [SCEV][NFC] API for tracking of SCEV users
This patch introduces API that keeps track of SCEVs users of
another SCEVs, required to handle invalidations of users along
with operands that comes in follow-up patches.

Differential Revision: https://reviews.llvm.org/D112295
Reviewed By: reames
2021-10-25 12:14:18 +07:00
Chen Zheng 80e6aff6bb [PowerPC] common chains to reuse offsets to reduce register pressure.
Add a new preparation pattern in PPCLoopInstFormPrep pass to reduce register
pressure.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D108750
2021-10-25 03:27:16 +00:00
Kazu Hirata 9800731367 [Target, Transforms] Use predecessors instead of pred_begin and pred_end (NFC) 2021-10-24 17:35:35 -07:00
Kazu Hirata 4bd46501c3 Use llvm::any_of and llvm::none_of (NFC) 2021-10-24 17:35:33 -07:00
Matthias Braun 4b75d674f8 X86InstrInfo: Look across basic blocks in optimizeCompareInstr
This extends `optimizeCompareInstr` to continue the backwards search
when it reached the beginning of a basic block. If there is a single
predecessor block then we can just continue the search in that block and
mark the EFLAGS register as live-in.

Differential Revision: https://reviews.llvm.org/D110862
2021-10-24 16:22:45 -07:00
Matthias Braun 683994c863 X86InstrInfo: Refactor and cleanup optimizeCompareInstr
This changes the first part of `optimizeCompareInstr` being split into a
loop with a forward scan for cases that re-use zero flags from a
producer in case of compare with zero and a backward scan for finding an
instruction equivalent to a compare.

The code now uses a single backward scan searching for the next
instructions that reads or writes EFLAGS.

Also:
- Add comments giving examples for the 3 cases handled.
- Check `MI` which contains the result of the zero-compare cases,
  instead of re-checking `IsCmpZero`.
- Tweak coding style in some loops.
- Add new MIR based tests that test the optimization in isolation.

This also removes a check for flag readers in situations like this:
```
= SUB32rr %0, %1, implicit-def $eflags
...  we no longer stop when there are $eflag users here
CMP32rr %0, %1   ; will be removed
...
```

Differential Revision: https://reviews.llvm.org/D110857
2021-10-24 16:22:45 -07:00
Philip Reames a461fa64bb Treat branch on poison as immediate UB (under an off by default flag)
The LangRef clearly states that branching on a undef or poison value is immediate undefined behavior, but historically, we have not been consistent about implementing that interpretation in the optimizer. Historically, we used (in some cases) a more relaxed model which essentially looked for provable UB along both paths which was control dependent on the condition. However, we've never been 100% consistent here. For instance SCEV uses the strong model for increments which form AddRecs (and only addrecs).

At the moment, the last big blocker for finally making this switch is enabling the fix landed in D106041. Loop unswitching (in it's classic form) is incorrect as it creates many "branch on poisons" when unswitching conditions originally unreachable within the loop.

This change adds a flag to value tracking which allows to easily test the optimization potential of treating branch on poison as immediate UB. It's intended to help ease work on getting us finally through this transition and avoid multiple independent rediscovers of the same issues.

Differential Revision: https://reviews.llvm.org/D112026
2021-10-24 14:42:03 -07:00
Philip Reames 3c06ecaa1e [instcombine] Fix oss-fuzz 39934 (mul matcher can match non-instruction)
Fixes a crash observed by oss-fuzz in 39934.  Issue at hand is that code expects a pattern match on m_Mul to imply the operand is a mul instruction, however mul constexprs are also valid here.
2021-10-24 14:42:03 -07:00
Fangrui Song 54405a49d8 [ARC] Fix -Wunused-variable. NFC 2021-10-24 10:31:44 -07:00
Kazu Hirata 1c35973c77 [llvm] Call *(Set|Map)::erase directly (NFC)
We can erase an item in a set or map without checking its membership
first.
2021-10-24 09:32:59 -07:00
Simon Pilgrim b09f2ee57c [X86] findEltLoadSrc - fix shift amount variable name. NFCI.
Fix the copy + paste, renaming shift amt from Idx to Amt
2021-10-23 21:24:37 +01:00
Nikita Popov 0c7f85d786 [InstSimplify] Simplify fetching of index size (NFC)
Directly fetch the size instead of going through the index type
first.
2021-10-23 22:08:15 +02:00
Nikita Popov 710596a1e1 [ConstantFolding] Accept offset in ConstantFoldLoadFromConstPtr (NFCI)
As this API is now internally offset-based, we can accept a starting
offset and remove the need to create a temporary bitcast+gep
sequence to perform an offset load. The API now mirrors the
ConstantFoldLoadFromConst() API.
2021-10-23 17:59:39 +02:00
Kazu Hirata d8e4170b0a Ensure newlines at the end of files (NFC) 2021-10-23 08:45:29 -07:00
Kazu Hirata d14d7068b6 [llvm] Use StringRef::contains (NFC) 2021-10-23 08:45:27 -07:00
Nikita Popov c5b5b7f621 [ConstantFolding] Remove ConstantFoldLoadThroughGEPIndices() API (NFC)
The last user of this API went away in
4f5e9a2bb2.
2021-10-23 16:59:29 +02:00
Nikita Popov 4f5e9a2bb2 [SCEV] Remove computeLoadConstantCompareExitLimit() (NFCI)
The functionality of this method is already covered by
computeExitCountExhaustively() in a more general fashion. It was
added at a time when exhaustive exit count calculation did not
support constant folding loads yet. I double checked that dropping
this code causes no binary changes in test-suite.

Differential Revision: https://reviews.llvm.org/D112343
2021-10-23 15:34:25 +02:00
Jessica Clarke 2d8c18fbbd [X86] Don't add implicit REP prefix to VIA PadLock xstore
Commit 8fa3e8fa14 added an implicit REP prefix to all VIA PadLock
instructions, but GNU as doesn't add one to xstore, only all the others.
This resulted in a kernel panic regression in FreeBSD upon updating to
LLVM 11 (https://bugs.freebsd.org/259218) which includes the commit in
question. This partially reverts that commit.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D112355
2021-10-23 01:57:17 +01:00
Nikita Popov 61cfdf636d [BasicAA] Model implicit trunc of GEP indices
GEP indices larger than the GEP index size are implicitly truncated
to the index size. BasicAA currently doesn't model this, resulting
in incorrect alias analysis results.

Fix this by explicitly modelling truncation in CastedValue in the
same way we do zext and sext. Additionally we need to disable a
number of optimizations for truncated values, in particular
"non-zero" and "non-equal" may no longer hold after truncation.
I believe the constant offset heuristic is also not necessarily
correct for truncated values, but wasn't able to come up with a
test for that one.

A possible followup here would be to use the new mechanism to
model explicit trunc as well (which should be much more common,
as it is the canonical form). This is straightforward, but omitted
here to separate the correctness fix from the analysis improvement.

(Side note: While I say "index size" above, BasicAA currently uses
the pointer size instead. Something for another day...)

Differential Revision: https://reviews.llvm.org/D110977
2021-10-22 23:47:02 +02:00
Matt Arsenault ec57b37551 AMDGPU: Use attributor to propagate amdgpu-flat-work-group-size
This can merge the acceptable ranges based on the call graph, rather
than the simple application of the attribute. Remove the handling from
the old pass.
2021-10-22 16:23:50 -04:00
Matt Arsenault 8d4b74ac3f AMDGPU: Don't consider whether amdgpu-flat-work-group-size was set
It should be semantically identical if it was set to the same value as
the default. Also improve the documentation.
2021-10-22 16:23:50 -04:00
Craig Topper cd824f9e39 [X86] Fix bad formatting. NFC 2021-10-22 13:16:35 -07:00
Duncan P. N. Exon Smith 2410fb4616 Support: Use Expected<T>::moveInto() in a few places
These are some usage examples for `Expected<T>::moveInto()`.

Differential Revision: https://reviews.llvm.org/D112280
2021-10-22 12:40:10 -07:00
Jay Foad 58e7ec471c [AMDGPU] Run SIShrinkInstructions before post-RA scheduling
Run post-RA SIShrinkInstructions just before post-RA scheduling, instead
of afterwards. After the fixes in D112305 and D112317 this seems to make
no difference, but it paves the way for scheduler tweaks that are
sensitive to the e32 vs e64 encoding of VALU instructions.

Differential Revision: https://reviews.llvm.org/D112341
2021-10-22 20:24:03 +01:00
Jay Foad 3f34f75a68 [AMDGPU] Fix latency for implicit vcc_lo operands on GFX10 wave32
As described in the comment, the way we change vcc to vcc_lo in these
operands confuses addPhysRegDataDeps into treating them as implicit
pseudo operands. Fix this by setting the correct latency from the
SchedModel after addPhysRegDataDeps wrongly set it to 0.

Differential Revision: https://reviews.llvm.org/D112317
2021-10-22 20:03:29 +01:00
Jay Foad 2915889d74 [ScheduleDAGInstrs] Call adjustSchedDependency in more cases
This removes a condition and the corresponding FIXME comment, because
the Hexagon assertion it refers to has apparently been fixed, probably
by D76134.

NFCI. This just gives targets the opportunity to adjust latencies that
were set to 0 by the generic code because they involve "implicit pseudo"
operands.

Differential Revision: https://reviews.llvm.org/D112306
2021-10-22 20:03:29 +01:00
Jeremy Morse e7084ceab3 [DebugInfo][Instr] Track subregisters across stack spills/restores
Sometimes we generate code that writes to a subregister, then spills /
restores a super-register to the stack, for example:

    $eax = MOV32ri 0
    MOV64mr $rsp, 1, $noreg, 16, $noreg, $rax
    $rcx = MOV64rm $rsp, 1, $noreg, 8, $noreg

This patch takes a different approach: it adds another index to
MLocTracker that identifies a size/offset within a stack slot. A location
on the stack is then a pari of {FrameIndex, SlotNum}. Spilling and
restoring now involves pairing up the src/dest register numbers, and the
dest/src stack position to be transferred to/from. Location coverage
improves as a result, compile-time performance decreases, alas.

One limitation is that if a PHI occurs inside a stack slot:

    DBG_PHI %stack.0, 1

We don't know how large the resulting value is, and so might have
difficulty picking which value to use. DBG_PHI might need to be augmented
in the future with such a size.

Unit tests added ensure that spills and restores correctly transfer to
positions in the Location => Value map, and that different register classes
written to the stack will correctly clobber all other positions in the
stack slot.

Differential Revision: https://reviews.llvm.org/D112133
2021-10-22 19:20:55 +01:00
Craig Topper 93139a3c32 [LegalizeTypes] Only expand CTLZ/CTTZ/CTPOP during type promotion if the new type is legal.
We might be promoting a large non-power of 2 type and the new type
may need to be split. Once we split it we may have a ctlz/cttz/ctpop
instruction for the split type.

I'm also concerned that we may create large shifts with shift amounts
that are too small.
2021-10-22 11:02:35 -07:00
Simon Pilgrim a5f56342b0 [DAG] narrowExtractedVectorLoad - EXTRACT_SUBVECTOR indices are always constant
EXTRACT_SUBVECTOR indices are always constant, we don't need to check for ConstantSDNode, we should just use getConstantOperandVal which will assert for the constant.
2021-10-22 18:32:14 +01:00
Philip Reames 412eb07edd [indvars] Use fact loop must exit to canonicalize to unsigned conditions
The logic in this patch is that if we find a comparison which would be unsigned except for when the loop is infinite, and we can prove that an infinite loop must be ill defined, we can still make the predicate unsigned.

The eventual goal (combined with a follow on patch) is to use the fact the loop exits to remove the zext (see tests) entirely.

A couple of points worth noting:
* We loose the ability to prove the loop unreachable by committing to the must exit interpretation. If instead, we later proved that rhs was definitely outside the range required for finiteness, we could have killed the loop entirely. (We don't currently implement this transform, but could in theory, do so.)
* simplifyAndExtend has a very limited list of users it walks. In particular, in the examples is stops at the zext and never visits the icmp. (Because we can't fold the zext to an addrec yet in SCEV.) Being willing to visit when we haven't simplified regresses multiple tests (seemingly because of less optimal results when computing trip counts).  D112170 explores fixing that, but - at least so far - appears to be too expensive compile time wise.

Differential Revision: https://reviews.llvm.org/D111836
2021-10-22 10:31:36 -07:00
Jeremy Morse d9eebe3cd7 [DebugInfo][InstrRef] Add unit tests for transfer-function building
This patch adds some unit tests for the machine-location transfer-function
building parts of InstrRefBasedLDV: i.e., test that if we feed some MIR
into the transfer-function building code, does it create the correct
transfer function.

There are a number of minor defects that get corrected in the process:
 * The unit test was selecting the x86 (i.e. 32 bit) backend rather than
   x86_64's 64 bit backend,
 * COPY instructions weren't actually having their subregister values
   correctly represented in the transfer function. Subregisters were being
   defined by the COPY, rather than taking the value in the source register.
 * SP aliases were at risk of being clobbered, if an SP subregister was
   clobbered.

Differential Revision: https://reviews.llvm.org/D112006
2021-10-22 18:29:03 +01:00
Craig Topper 04c184bba7 [TargetLowering] Simplify the interface of expandABS. NFC
Instead of returning a bool to indicate success and a separate
SDValue, return the SDValue and have the callers check if it is
null.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112331
2021-10-22 10:22:23 -07:00
Nikita Popov 3a10fe2d89 [Loads] Use more powerful constant folding API
This follows up on D111023 by exporting the generic "load value
from constant at given offset as given type" and using it in the
store to load forwarding code. We now need to make sure that the
load size is smaller than the store size, previously this was
implicitly ensured by ConstantFoldLoadThroughBitcast().

Differential Revision: https://reviews.llvm.org/D112260
2021-10-22 18:33:03 +02:00
Nikita Popov 5bb7562962 [Attributor] Generalize GEP construction
Make use of the getGEPIndicesForOffset() helper for creating GEPs.
This handles arrays as well, uses correct GEP index types and
reduces code duplication.

Differential Revision: https://reviews.llvm.org/D112263
2021-10-22 18:30:43 +02:00
Craig Topper 0766aef3f3 [LegalizeTypes][RISCV][PowerPC] Expand CTLZ/CTTZ/CTPOP instead of promoting if they'll be expanded later.
Expanding these requires multiple constants. If we promote during type
legalization when they'll end up getting expanded in LegalizeDAG, we'll
use larger constants. These constants may be harder to materialize.
For example, 64-bit constants on 64-bit RISCV are very expensive.

This is similar to what has already been done to BSWAP and BITREVERSE.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112268
2021-10-22 09:10:01 -07:00
Kazu Hirata 6fe949c4ed [Target, Transforms] Use StringRef::contains (NFC) 2021-10-22 08:52:33 -07:00
Jonas Paulsson 12b44bf5ee [SystemZ] Give the EXRL_Pseudo a size value of 6 bytes.
This pseudo is expanded very late (AsmPrinter) and therefore has to have a
correct size value, or the branch relaxation pass may make a wrong decision.

Review: Ulrich Weigand
2021-10-22 17:38:51 +02:00
Bradley Smith cfe22cd4ef [AArch64][SVE] Add new ld<n> intrinsics that return a struct of vscale types
This will allow us to reuse existing interleaved load logic in
lowerInterleavedLoad that exists for neon types, but for SVE fixed
types.

The goal eventually will be to replace the existing ld<n> intriniscs
with these, once a migration path has been sorted out.

Differential Revision: https://reviews.llvm.org/D112078
2021-10-22 14:13:17 +00:00
Zarko Todorovski 0bd6a9f2d1 [clang/llvm] Inclusive language: replace segregate with separate 2021-10-22 09:59:35 -04:00
Roman Lebedev 8fac9e95ad
[X86] `X86TTIImpl::getInterleavedMemoryOpCost()`: scale interleaving cost by the fraction of live members
By definition, interleaving load of stride N means:
load N*VF elements, and shuffle them into N VF-sized vectors,
with 0'th vector containing elements `[0, VF)*stride + 0`,
and 1'th vector containing elements `[0, VF)*stride + 1`.
Example: https://godbolt.org/z/df561Me5E (i64 stride 4 vf 2 => cost 6)

Now, not fully interleaved load, is when not all of these vectors is demanded.
So at worst, we could just pretend that everything is demanded,
and discard the non-demanded vectors. What this means is that the cost
for not-fully-interleaved group should be not greater than the cost
for the same fully-interleaved group, but perhaps somewhat less.
Examples:
https://godbolt.org/z/a78dK5Geq (i64 stride 4 (indices 012u) vf 2 => cost 4)
https://godbolt.org/z/G91ceo8dM (i64 stride 4 (indices 01uu) vf 2 => cost 2)
https://godbolt.org/z/5joYob9rx (i64 stride 4 (indices 0uuu) vf 2 => cost 1)

Right now, for such not-fully-interleaved loads we just use the costs
for fully-interleaved loads. But at least **in general**,
that is obviously overly pessimistic, because **in general**,
not all the shuffles needed to perform the full interleaving
will end up being live.

So what this does, is naively scales the interleaving cost
by the fraction of the live members. I believe this should still result
in the right ballpark cost estimate, although it may be over/under -estimate.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112307
2021-10-22 16:33:58 +03:00
Jay Foad 74cd4dee20 [AMDGPU] Preserve deadness of vcc when shrinking instructions
This doesn't have any effect on codegen now, but it might do in the
future if we shrink instructions before post-RA scheduling, which is
sensitive to live vs dead defs.

Differential Revision: https://reviews.llvm.org/D112305
2021-10-22 14:22:24 +01:00
Simon Pilgrim a750332d77 AMDGPULibCalls - constify some FuncInfo& arguments. NFCI. 2021-10-22 12:10:58 +01:00
Simon Pilgrim 99a64cc9da AMDGPULibCalls::parseFunctionName - use reference instead of pointer. NFCI.
parseFunctionName allowed a default null pointer, despite it being dereferenced immediately to be used as a reference and that all callers were taking the address of an existing reference.

Fixes static analyzer warning about potential dereferenced nulls
2021-10-22 11:45:25 +01:00
Michał Górny 66e06cc8cb [llvm] [ADT] Update llvm::Split() per Pavel Labath's suggestions
Optimize the iterator comparison logic to compare Current.data()
pointers.  Use std::tie for assignments from std::pair.  Replace
the custom class with a function returning iterator_range.

Differential Revision: https://reviews.llvm.org/D110535
2021-10-22 12:27:46 +02:00
Florian Hahn d465315679
[LLVM-C]Add LLVMAddMetadataToInst, deprecated LLVMSetInstDebugLocation.
IRBuilder has been updated to support preserving metdata in a more
general manner. This patch adds `LLVMAddMetadataToInst` and
deprecates `LLVMSetInstDebugLocation` in favor of the more
general function.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D93454
2021-10-22 11:21:28 +01:00
Fraser Cormack 74c6895b39 [RISCV] Fix missing cross-block VSETVLI insertion
This patch fixes a codegen bug, the test for which was introduced in
D112223.

When merging VSETVLIInfo across blocks, if the 'exit' VSETVLIInfo
produced by a block is found to be compatible with the VSETVLIInfo
computed as the intersection of the 'exit' VSETVLIInfo produced by the
block's predecessors, that blocks' 'exit' info is discarded and the
intersected value is taken in its place.

However, we have one authority on what constitutes VSETVLIInfo
compatibility and we are using it in two different contexts.

Compatibility is used in one context to elide VSETVLIs between
straight-line vector instructions. But compatibility when evaluated
between two blocks' exit infos ignores any info produced *inside* each
respective block before the exit points. As such it does not guarantee
that a block will not produce a VSETVLI which is incompatible with the
'previous' block.

As such, we must ensure that any merging of VSETVLIInfo is performed
using some notion of "strict" compatibility. I've defined this as a full
vtype match, but this is perhaps too pessimistic. Given that test
coverage in this regard is lacking -- the only change is in the failing
test -- I think this is a good starting point.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D112228
2021-10-22 10:45:10 +01:00
Chen Zheng 86a5c32616 [PowerPC] iterate on the SmallSet directly; NFC 2021-10-22 06:18:07 +00:00
Chen Zheng 13755436bb [PowerPC] return early if there is no preparing candidate in the loop; NFC
This is to improve compiling time.

Differential Revision: https://reviews.llvm.org/D112196

Reviewed By: jsji
2021-10-22 05:39:51 +00:00
Chuanqi Xu ddbf196194 [Coroutines] Ignore partial lifetime markers refer of an alloca
When I playing with Coroutines, I found that it is possible to generate
following IR:
```
%struct = alloca ...
%sub.element = getelementptr %struct, i64 0, i64 index ; index is not
%zero
lifetime.marker.start(%sub.element)
% use of %sub.element
lifetime.marker.end(%sub.element)
store %struct to xxx ;  %struct is escaping!

<suspend points>
```

Then the AllocaUseVisitor would collect the lifetime marker for
sub.element and treat it as the lifetime markers of the alloca! So it
judges that the alloca could be put on the stack instead of the frame by
judging the lifetime markers only.
The root cause for the bug is that AllocaUseVisitor collects wrong
lifetime markers.

This patch fixes this.

Reviewed By: lxfind

Differential Revision: https://reviews.llvm.org/D112216
2021-10-22 09:49:50 +08:00
Vitaly Buka b7ea298dfd [msan] Don't use TLS slots of noundef args
Transformations may strip the attribute from the
argument, e.g. for unused, which will result in
shadow offsets mismatch between caller and
callee.

Stripping noundef for used arguments can be
a problem, as TLS is not going to be set
by caller. However this is not the goal of the
patch and I am not aware if that's even
possible.

Differential Revision: https://reviews.llvm.org/D112197
2021-10-21 18:35:12 -07:00
Stanislav Mekhanoshin ca0c92d6a1 [AMDGPU] Allow to use a whole register file on gfx90a for VGPRs
In a kernel which does not have calls or AGPR usage we can allocate
the whole vector register budget for VGPRs and have no AGPRs as
long as VGPRs stay addressable (i.e. below 256).

Differential Revision: https://reviews.llvm.org/D111764
2021-10-21 18:24:34 -07:00
Luís Ferreira 2e97236aac [Demangle] Rename OutputStream to OutputString
This patch is a refactor to implement prepend afterwards. Since this changes a lot of files and to conform with guidelines, I will separate this from the implementation of prepend. Related to the discussion in https://reviews.llvm.org/D111414 , so please read it for more context.

Reviewed By: #libc_abi, dblaikie, ldionne

Differential Revision: https://reviews.llvm.org/D111947
2021-10-21 17:34:57 -07:00
Jack Anderson d7733f8422 [DebugInfo] Expand ability to load 2-byte addresses in dwarf sections
Some dwarf loaders in LLVM are hard-coded to only accept 4-byte and 8-byte address sizes. This patch generalizes acceptance into `DWARFContext::isAddressSizeSupported` and provides a common way to generate rejection errors.

The MSP430 target has been given new tests to cover dwarf loading cases that previously failed due to 2-byte addresses.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D111953
2021-10-21 17:31:00 -07:00
Craig Topper 996123e5e8 [TargetLowering] Simplify the interface for expandCTPOP/expandCTLZ/expandCTTZ.
There is no need to return a bool and have an SDValue output
parameter. Just return the SDValue and let the caller check if it
is null.

I have another patch to add more callers of these so I thought
I'd clean up the interface first.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112267
2021-10-21 15:35:28 -07:00
Craig Topper ff37b1105d [LegalizeVectorOps][X86] Don't defer BITREVERSE expansion to LegalizeDAG.
By expanding early it allows the shifts to be custom lowered in
LegalizeVectorOps. Then a DAG combine is able to run on them before
LegalizeDAG handles the BUILD_VECTORS for the masks used.

v16Xi8 shift lowering on X86 requires a mask to be applied to a v8i16
shift. The BITREVERSE expansion applied an AND mask before SHL ops and
after SRL ops. This was done to share the same mask constant for both shifts.
It looks like this patch allows DAG combine to remove the AND mask added
after v16i8 SHL by X86 lowering. This maintains the mask sharing that
BITREVERSE was trying to achieve. Prior to this patch it looks like
we kept the mask after the SHL instead which required an extra constant
pool or a PANDN to invert it.

This is dependent on D112248 because RISCV will end up scalarizing the BSWAP
portion of the BITREVERSE expansion if we don't disable BSWAP scalarization in
LegalizeVectorOps first.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112254
2021-10-21 15:23:23 -07:00
Craig Topper 458ed5fcc3 [TargetLowering][RISCV] Prevent scalarization of fixed vector bswap.
It's better to do the ands, shifts, ors in the vector domain than
to scalarize it and do those operations on each element.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112248
2021-10-21 14:34:01 -07:00
Jessica Paquette 5dc339d982 [AArch64][GlobalISel] Fold 64-bit cmps with 64-bit adds
G_ICMP is selected to an arithmetic overflow op (ADDS/SUBS/etc) with a dead
destination + a CSINC instruction.

We have a fold which allows us to combine 32-bit adds with G_ICMP.

The problem with G_ICMP is that we model it as always having a 32-bit
destination even though it can be a 64-bit operation. So, we were missing some
opportunities for 64-bit folds.

This patch teaches the fold to recognize 64-bit G_ICMPs + refactors some of
the code surrounding CSINC accordingly.

(Later down the line, I think we should probably change the way we handle G_ICMP
in general.)

Differential Revision: https://reviews.llvm.org/D111088
2021-10-21 13:51:44 -07:00
Yonghong Song 0472e83ffc BPF: emit BTF_KIND_DECL_TAG for typedef types
If a typedef type has __attribute__((btf_decl_tag("str"))) with
bpf target, emit BTF_KIND_DECL_TAG for that type in the BTF.

Differential Revision: https://reviews.llvm.org/D112259
2021-10-21 12:09:42 -07:00
Nikita Popov 1848525842 [CodeMetrics] Don't require speculatability for ephemeral values
As discussed in D112016, our current requirement of speculatability
for ephemeral is overly strict: What we really care about is that
the instruction will be DCEd once the assume is dropped. For that
it is sufficient that the instruction is side-effect free and not
a terminator.

In particular, this allows non-dereferenceable loads to be ephemeral
values.

Differential Revision: https://reviews.llvm.org/D112179
2021-10-21 20:30:01 +02:00
Craig Topper d55be79d75 [RISCV] Expand scalable vector CTTZ/CTLZ/CTPOP.
Differential Revision: https://reviews.llvm.org/D112233
2021-10-21 10:50:04 -07:00
Arthur Eubanks 3781a46c3c Revert "[IPT] Restructure cache to allow lazy update following invalidation [NFC]"
This reverts commit baea663a6e.

Causes crashes, e.g. https://lab.llvm.org/buildbot/#/builders/77/builds/10715.
2021-10-21 10:48:41 -07:00
Sanjay Patel 66d22b4da4 [VectorCombine] fold shuffle-of-binops with common operand
shuf (bo X, Y), (bo X, W) --> bo (shuf X), (shuf Y, W)

This is motivated by an example in D111800
(although that patch avoids the problem for that particular example).

The pattern is shown in reduced form with:
https://llvm.org/PR52178
https://alive2.llvm.org/ce/z/d8zB4D

There is no difference on the PhaseOrdering test from D111800
because the aarch64 cost model says that the shuffle cost is 3 while
the fadd cost is 2.

Differential Revision: https://reviews.llvm.org/D111901
2021-10-21 12:37:54 -04:00
Philip Reames baea663a6e [IPT] Restructure cache to allow lazy update following invalidation [NFC]
This change restructures the cache used in IPT to point not to the first special instruction, but to the first instruction which *could* be special. That is, the cached reference is always equal to the first special, or comes before it in the block.

This avoids expensive block scans when we are removing special instructions from the beginning of the block. At the moment, this case is not heavily used, though it does trigger in GVN when doing CSE of calls. The main motivation was a change I'm no longer planning to move forward with, but the cache optimization seemed worthwhile as a minor perf win at low cost.

Differential Revision: https://reviews.llvm.org/D111768
2021-10-21 09:16:21 -07:00
Yonghong Song f6811cec84 [DebugInfo] Support typedef with btf_decl_tag attributes
Clang patch ([1]) added support for btf_decl_tag attributes with typedef
types. This patch added llvm support including dwarf generation.
For example, for typedef
   typedef unsigned * __u __attribute__((btf_decl_tag("tag1")));
   __u u;
the following shows llvm-dwarfdump result:
   0x00000033:   DW_TAG_typedef
                   DW_AT_type      (0x00000048 "unsigned int *")
                   DW_AT_name      ("__u")
                   DW_AT_decl_file ("/home/yhs/work/tests/llvm/btf_tag/t.c")
                   DW_AT_decl_line (1)

   0x0000003e:     DW_TAG_LLVM_annotation
                     DW_AT_name    ("btf_decl_tag")
                     DW_AT_const_value     ("tag1")

   0x00000047:     NULL

  [1] https://reviews.llvm.org/D110127

Differential Revision: https://reviews.llvm.org/D110129
2021-10-21 08:42:58 -07:00
Sanjay Patel 3888de9507 [InstCombine] generalize reassociated Demorgan folds
This updates the recent D112108 / b92412fb28
to handle the flipped logic ('or') sibling:
https://alive2.llvm.org/ce/z/Y2L6Ch
2021-10-21 10:39:29 -04:00
Anirudh Prasad aa3519f178 [SystemZ][z/OS] Initial implementation for lowerCall on z/OS
- This patch provides the initial implementation for lowering a call on z/OS according to the XPLINK64 calling convention
- A series of changes have been made to SystemZCallingConv.td to account for these additional XPLINK64 changes including adding a new helper function to shadow the stack along with allocation of a register wherever appropriate
- For the cases of copying a f64 to a gr64 and a f128 / 128-bit vector type to a gr64, a `CCBitConvertToType` has been added and has been bitcasted appropriately in the lowering phase
- Support for the ADA register (R5) will be provided in a later patch.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D111662
2021-10-21 09:48:59 -04:00
Sanjay Patel d2198771e9 [DAGCombiner] fold bit-hack form of usubsat
(i8 X ^ 128) & (i8 X s>> 7) --> usubsat X, 128

I haven't found a generalization of this identity:
https://alive2.llvm.org/ce/z/_sriEQ

Note: I was actually looking at the first form of the pattern in that link,
but that's part of a long chain of potential missed transforms in codegen
and IR....that I hope ends here!

The predicates for when this is profitable are a bit tricky. This version of
the patch excludes multi-use but includes custom lowering (as opposed to
legal only).

On x86 for example, we have custom lowering for some vector types, and that
uses umax and sub. So to enable that fold, we need add use checks to avoid
regressions. Even with legal-only lowering, we could see code with extra
reg move instructions for extra uses, so that constraint would have to be
eased very carefully to avoid penalties.

Differential Revision: https://reviews.llvm.org/D112085
2021-10-21 09:47:19 -04:00
Alexey Bataev 3ea7877c8b [SLP]Unify vectorization of PHI and store nodes with improved tiny tree vectorization.
Vectorization of PHIs and stores very similar, it might be beneficial to
try to revectorize stores (like PHIs) if the total number of stores with
the same/alternate opcode is less than the vector size but number of
stores with the same type is larger than the vector size.

Differential Revision: https://reviews.llvm.org/D109831
2021-10-21 06:25:32 -07:00
Kerry McLaughlin 0d153df69e [SVE] Fix selection failure when splitting extended masked loads
When splitting a masked load, `GetDependentSplitDestVTs` is used to get the
MemVTs of the high and low parts. If the masked load is extended, this
may return VTs with different element types which are used to create the
high & low masked load instructions.
This patch changes `GetDependentSplitDestVTs` to ensure we return VTs with
the same element type.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D111996
2021-10-21 13:04:38 +01:00
YunQiang Su 302a165e18 [MIPS] Fix switching between 32/64-bit variants of r6 target triples
If clang driver gets 64-bit r6 target triple like `mipsisa64r6` and
additional option forces switching to generation of 32-bit code, it
loses r6 abi and generates 32-bit r2-r5 abi code.

```
$ clang -target mipsisa64r6-linux-gnu -mabi=32
```

This patch fixes the problem.

- Add optional `SubArchType` argument to the `Triple::setArch()` method.
- Implement generation of mips r6 target triples in the
  `Triple::getArchName()` method.

Differential Revision: https://reviews.llvm.org/D110514.diff
2021-10-21 15:04:07 +03:00
Dawid Jurczak 9ba5bb4309 [NFC][LoopIdiom] Make for loops more readable
Patch simplifies for loops in LIR following LLVM guidelines: https://llvm.org/docs/CodingStandards.html#use-range-based-for-loops-wherever-possible.

Differential Revision: https://reviews.llvm.org/D112077
2021-10-21 12:17:44 +02:00
Evgeniy Brevnov 1a8ec24efb [NARY-REASSOCIATE][NFC] Simplify min/max handling
In order to explore different variants of reassociation current implementation uses "swap in a loop" approach. Unfortunately, the implementation is more complicated than it could be. This is an attempt to streamline the code. New approach is to extract core functionality into a helper function and call it explicitly as many times as required.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D112128
2021-10-21 15:45:53 +07:00
David Sherwood 9448cdc900 [SVE][Analysis] Tune the cost model according to the tune-cpu attribute
This patch introduces a new function:

  AArch64Subtarget::getVScaleForTuning

that returns a value for vscale that can be used for tuning the cost
model when using scalable vectors. The VScaleForTuning option in
AArch64Subtarget is initialised according to the following rules:

1. If the user has specified the CPU to tune for we use that, else
2. If the target CPU was specified we use that, else
3. The tuning is set to "generic".

For CPUs of type "generic" I have assumed that vscale=2.

New tests added here:

  Analysis/CostModel/AArch64/sve-gather.ll
  Analysis/CostModel/AArch64/sve-scatter.ll
  Transforms/LoopVectorize/AArch64/sve-strict-fadd-cost.ll

Differential Revision: https://reviews.llvm.org/D110259
2021-10-21 09:33:50 +01:00
Vitaly Buka 6742c8a2d8 [NFC][msan] Break the loop when done
We have nothing to do after the Argument
is found.
2021-10-20 21:08:12 -07:00
Arthur Eubanks 6ea7437ca5 [SelectionDAG] Bail out of mergeTruncStores when not optimizing
With unoptimized code, we may see lots of stores and spend too much time in mergeTruncStores.

Fixes PR51827.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D111596
2021-10-20 16:58:22 -07:00
Sanjay Patel 40163f1df8 [x86] add special-case lowering for usubsat for AVX512
This is a small extension of D112095 to avoid another regression
seen with D112085.
In this case, we allow the same conversion from usubsat to ALU
ops if the target supports vpternlog.

That pattern will get converted later in X86DAGToDAGISel::tryVPTERNLOG().
This seems better than putting a magic immediate constant directly in
this code to create the exact vpternlog that we need. It's possible that
there are other special-cases along these lines, so we should try to
keep all of the vpternlog magic in one place.

Differential Revision: https://reviews.llvm.org/D112138
2021-10-20 16:41:13 -04:00
Stanislav Mekhanoshin b92412fb28 [InstCombine] Fold `(a & ~b) & ~c` to `a & ~(b | c)`
%not1 = xor i32 %b, -1
  %not2 = xor i32 %c, -1
  %and1 = and i32 %a, %not1
  %and2 = and i32 %and1, %not2
=>
  %i1 = or i32 %b, %c
  %i2 = xor i32 %1, -1
  %and2 = and i32 %i2, %a

Differential Revision: https://reviews.llvm.org/D112108
2021-10-20 13:05:46 -07:00
Florian Hahn 8977bd5806
[IndVars] Invalidate SCEV when IR is changed in rewriteLoopExitValue.
At the moment, rewriteLoopExitValue forgets the current phi node in the
loop that collects phis to rewrite. A few lines after the value is
forgotten, SCEV is used again to analyze incoming values and
potentially expand SCEV expression. This means that another SCEV is
created for PN, before the IR is actually updated in the next loop.

This leads to accessing invalid cached expression in combination with
D71539.

PN should only be changed once the actual incoming exit value is set in
the next loop. Moving invalidation there should ensure that PN is
invalidated in all relevant cases.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D111495
2021-10-20 20:48:33 +01:00
Jon Roelofs b046eb19b8 [AArch64][GlobalISel] combine (and (or x, c1), c2) => (and x, c2) iff c1 & c2 == 0
https://godbolt.org/z/h8ejrG4hb

rdar://83597585

Differential Revision: https://reviews.llvm.org/D111856
2021-10-20 12:11:52 -07:00
Stanislav Mekhanoshin c80d8a8cea [AMDGPU] MachineLICM cannot hoist VALU
MachineLoop::isLoopInvariant() returns false for all VALU
because of the exec use. Check TII::isIgnorableUse() to
allow hoisting.

That unfortunately results in higher register consumption
since MachineLICM does not adequately estimate pressure.
Therefor I think it shall only be enabled after D107677 even
though it does not depend on it.

Differential Revision: https://reviews.llvm.org/D107859
2021-10-20 11:47:24 -07:00
Stanislav Mekhanoshin 6185835656 [AMDGPU] Allow rematerialization of SOP with virtual registers
D106408 was doing this for all targets although it was
reverted due to couple performance regressions on some targets.
The difference for AMDGPU is the ability to rematerialize SOP
instructions with virtual register uses like we already do for VOP.

Differential Revision: https://reviews.llvm.org/D110743
2021-10-20 11:46:50 -07:00
Leonard Grey 5d57578a4e [MC] Recursively calculate symbol offset
This is speculative since I'm not sure if there's some implicit contract that a
variable symbol must not have another variable symbol in its evaluation tree.

Downstream bug: https://bugs.chromium.org/p/chromium/issues/detail?id=471146#c23.

Test is based on alias.s (removed checks since we just need to know it didn't
crash).

Differential Revision: https://reviews.llvm.org/D109109
2021-10-20 14:29:43 -04:00
Sanjay Patel 80ab06c599 [InstCombine] fold fake vector insert to bit-logic
bitcast (inselt (bitcast X), Y, 0) --> or (and X, MaskC), (zext Y)

https://alive2.llvm.org/ce/z/Ux-662

Similar to D111082 / db231ebdb0 :
We want to avoid relatively opaque vector ops on types that are
likely supported by the backend as scalar integers. The bitwise
logic ops are more likely to allow further combining.

We probably want to generalize this to allow a shift too, but
that would oppose instcombine's general rule of not creating
extra instructions, so that's left as a potential follow-up.
Alternatively, we could do that transform in VectorCombine
with the help of the TTI cost model.

This is part of solving:
https://llvm.org/PR52057
2021-10-20 14:21:40 -04:00
Arthur Eubanks 00500d5bad [NFC] De-template LazyCallGraph::visitReferences() and move into .cpp file
This makes changing it and recompiling it much faster.
2021-10-20 10:50:00 -07:00
Itay Bookstein 08ed216000 [IR] Refactor GlobalIFunc to inherit from GlobalObject, Remove GlobalIndirectSymbol
As discussed in:
* https://reviews.llvm.org/D94166
* https://lists.llvm.org/pipermail/llvm-dev/2020-September/145031.html

The GlobalIndirectSymbol class lost most of its meaning in
https://reviews.llvm.org/D109792, which disambiguated getBaseObject
(now getAliaseeObject) between GlobalIFunc and everything else.
In addition, as long as GlobalIFunc is not a GlobalObject and
getAliaseeObject returns GlobalObjects, a GlobalAlias whose aliasee
is a GlobalIFunc cannot currently be modeled properly. Creating
aliases for GlobalIFuncs does happen in the wild (e.g. glibc). In addition,
calling getAliaseeObject on a GlobalIFunc will currently return nullptr,
which is undesirable because it should return the object itself for
non-aliases.

This patch refactors the GlobalIFunc class to inherit directly from
GlobalObject, and removes GlobalIndirectSymbol (while inlining the
relevant parts into GlobalAlias and GlobalIFunc). This allows for
calling getAliaseeObject() on a GlobalIFunc to return the GlobalIFunc
itself, making getAliaseeObject() more consistent and enabling
alias-to-ifunc to be properly modeled in the IR.

I exercised some judgement in the API clients of GlobalIndirectSymbol:
some were 'monomorphized' for GlobalAlias and GlobalIFunc, and
some remained shared (with the type adapted to become GlobalValue).

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D108872
2021-10-20 10:29:47 -07:00
Zhi An Ng e1fb13401e [WebAssembly] Add prototype relaxed float min max instructions
Add relaxed. f32x4.min, f32x4.max, f64x2.min, f64x2.max. These are only
exposed as builtins, and require user opt-in.

Differential Revision: https://reviews.llvm.org/D112146
2021-10-20 09:41:51 -07:00
Fraser Cormack eabf11f9ea [CodeGenPrepare] Avoid a scalable-vector crash in ctlz/cttz
This patch fixes a crash when despeculating ctlz/cttz intrinsics with
scalable-vector types. It is not safe to speculatively get the size of
the vector type in bits in case the vector type is not a fixed-length type. As
it happens this isn't required as vector types are skipped anyway.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112141
2021-10-20 16:45:55 +01:00
Craig Topper fe1f0de003 [RISCV][WebAssembly][TargetLowering] Allow expandCTLZ/expandCTTZ to rely on CTPOP expansion for vectors.
Our fallback expansion for CTLZ/CTTZ relies on CTPOP. If CTPOP
isn't legal or custom for a vector type we would scalarize the
CTLZ/CTTZ. This is different than CTPOP itself which would use a
vector expansion.

This patch teaches expandCTLZ/CTTZ to rely on the vector CTPOP
expansion instead of scalarizing. To do this I had to add additional
checks to make sure the operations used by CTPOP expansions are all
supported. Some of the operations were already needed for the CTLZ/CTTZ
expansion.

This is a huge improvement to the RISCV which doesn't have a scalar
ctlz or cttz in the base ISA.

For WebAssembly, I've added Custom lowering to keep the scalarizing
behavior. I've also extended the scalarizing to CTPOP.

Differential Revision: https://reviews.llvm.org/D111919
2021-10-20 07:46:41 -07:00
Sanjay Patel 3efd2a0bec [x86] make helper for useVPTERNLOG; NFC
See D112085 for another use case.
2021-10-20 10:26:53 -04:00
Jeremy Morse 89950ade21 [DebugInfo][InstrRef] Track a single variable at a time
Here's another performance patch for InstrRefBasedLDV: rather than
processing all variable values in a scope at a time, instead, process one
variable at a time. The benefits are twofold:
 * It's easier to reason about one variable at a time in your mind,
 * It improves performance, apparently from increased locality.

The downside is that the value-propagation code gets indented one level
further, plus there's some churn in the unit tests.

Differential Revision: https://reviews.llvm.org/D111799
2021-10-20 15:03:52 +01:00
Sander de Smalen be6c8dc765 [SelectionDAG] Fix getVectorSubVecPointer for scalable subvectors.
When inserting a scalable subvector into a scalable vector through
the stack, the index to store to needs to be scaled by vscale.
Before this patch, that didn't yet happen, so it would generate the
wrong offset, thus storing a subvector to the incorrect address
and overwriting the wrong lanes.

For some insert:
  nxv8f16 insert_subvector(nxv8f16 %vec, nxv2f16 %subvec, i64 2)

The offset was not scaled by vscale:
  orr     x8, x8, #0x4
  st1h    { z0.h }, p0, [sp]
  st1h    { z1.d }, p1, [x8]
  ld1h    { z0.h }, p0/z, [sp]

And is changed to:
  mov x8, sp
  st1h { z0.h }, p0, [sp]
  st1h { z1.d }, p1, [x8, #1, mul vl]
  ld1h { z0.h }, p0/z, [sp]

Differential Revision: https://reviews.llvm.org/D111633
2021-10-20 13:55:24 +01:00
Simon Pilgrim 5b395bd633 [CostModel][X86] Add costs for multiply-by-pow2 constants
These are folded to left shifts in the backend.

We should be able to extend this for multiply-by-negpow2 after D111968 has landed to resolve PR51436
2021-10-20 13:11:21 +01:00
Simon Pilgrim 9fc523d114 [X86] Remove X86ProcFamilyEnum::IntelSLM
Replace X86ProcFamilyEnum::IntelSLM enum with a TuningUseSLMArithCosts flag instead, matching what we already do for Goldmont.

This just leaves X86ProcFamilyEnum::IntelAtom to replace with general Tuning/Feature flags and we can finally get rid of the old X86ProcFamilyEnum enum.

Differential Revision: https://reviews.llvm.org/D112079
2021-10-20 11:58:39 +01:00
Daniel Kiss f903c85055 [AArch64] Emit .cfi_negate_ra_state for PAC-auth instructions.
autiasp, autibsp instructions are the counterpart of paciasp/pacibsp instructions
therefore let's emit .cfi_negate_ra_state for these too.
In case of Armv8.3 instruction set the retaa/retbb will do the return and authentication
in one step here we can't emit the . cfi_negate_ra_state because that would be point after
the ret* instruction.

Reviewed By: nickdesaulniers, MaskRay

Differential Revision: https://reviews.llvm.org/D111780
2021-10-20 11:03:52 +02:00
Joerg Sonnenberger ec428f7b78 [SPARC] Recognize the prefetch instruction
Reviewed By: LemonBoy

Differential Revision: https://reviews.llvm.org/D96311
2021-10-20 10:59:01 +02:00
Paulo Matos 6d0c7bc17d [WebAssembly] Implementation of table.get/set for reftypes in LLVM IR
This change implements new DAG nodes TABLE_GET/TABLE_SET, and lowering
methods for load and stores of reference types from IR arrays. These
global LLVM IR arrays represent tables at the Wasm level.

Differential Revision: https://reviews.llvm.org/D111154
2021-10-20 10:31:31 +02:00
Zi Xuan Wu de10a02fc0 [CSKY] Complete to add basic integer instruction set
Complete the basic integer instruction set and add related predictor in CSKY.td.
And it includes the instruction definition and asm parser support.

Differential Revision: https://reviews.llvm.org/D111701
2021-10-20 15:50:44 +08:00
Evgeniy Brevnov 269f563a2b [NARY-REASSOCIATE] Fix infinite recursion optimizing min\max
To guarantee convergence of the algorithm each optimization step should decrease number of instructions when IR is modified. This property is not held in this test case. The problem is that SCEV Expander may do "unexpected" reassociation what results in creation of new min/max chains and introduction of extra instructions. As a result on each step we indefinitely optimize back and forth.

The solution is to restrict SCEV Expander to perform uncontrolled reassociations by means of "Unknown" expressions.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D112060
2021-10-20 14:23:03 +07:00
Zhi An Ng 2542bfa43a [WebAssembly] Add prototype relaxed swizzle instructions
Add i8x16 relaxed_swizzle instructions. These are only
exposed as builtins, and require user opt-in.

Differential Revision: https://reviews.llvm.org/D112022
2021-10-19 17:53:04 -07:00
Yonghong Song cd40b5a712 BPF: set .BTF and .BTF.ext section alignment to 4
Currently, .BTF and .BTF.ext has default alignment of 1.
For example,
  $ cat t.c
    int foo() { return 0; }
  $ clang -target bpf -O2 -c -g t.c
  $ llvm-readelf -S t.o
    ...
    Section Headers:
    [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
    ...
    [ 7] .BTF              PROGBITS        0000000000000000 000167 00008b 00      0   0  1
    [ 8] .BTF.ext          PROGBITS        0000000000000000 0001f2 000050 00      0   0  1

But to have no misaligned data access, .BTF and .BTF.ext
actually requires alignment of 4. Misalignment is not an issue
for architecture like x64/arm64 as it can handle it well. But
some architectures like mips may incur a trap if .BTF/.BTF.ext
is not properly aligned.

This patch explicitly forced .BTF and .BTF.ext alignment to be 4.
For the above example, we will have
    [ 7] .BTF              PROGBITS        0000000000000000 000168 00008b 00      0   0  4
    [ 8] .BTF.ext          PROGBITS        0000000000000000 0001f4 000050 00      0   0  4

Differential Revision: https://reviews.llvm.org/D112106
2021-10-19 16:26:01 -07:00
Artem Belevich b6b7fe60a4 [NVPTX] Add a late SROA pass which allows optimizing away more allocas.
Fixes performance regression https://bugs.llvm.org/show_bug.cgi?id=52037

Differential Revision: https://reviews.llvm.org/D111471
2021-10-19 16:18:28 -07:00
Yuta Saito 1813fde9cc [WebAssembly] Emit clangast in custom section aligned by 4 bytes
Emit __clangast in custom section instead of named data segment
to find it while iterating sections.
This could be avoided if all data segements (the wasm sense) were
represented as their own sections (in the llvm sense).
This can be resolved by https://github.com/WebAssembly/tool-conventions/issues/138

And the on-disk hashtable in clangast needs to be aligned by 4 bytes,
so add paddings in name length field in custom section header.

The length of clangast section name can be represented in 1 byte
by leb128, and possible maximum pads are 3 bytes, so the section
name length won't be invalid in theory.

Fixes https://bugs.llvm.org/show_bug.cgi?id=35928

Differential Revision: https://reviews.llvm.org/D74531
2021-10-19 15:50:08 -07:00
Sanjay Patel 92a0389b04 [x86] add special-case lowering for usubsat for pre-SSE4
usubsat X, SMIN --> (X ^ SMIN) & (X s>> BW-1)

This would be a regression with D112085 where we combine to
usubsat more aggressively, so avoid that by matching the
special-case where we are subtracting SMIN (signmask):
https://alive2.llvm.org/ce/z/4_3gBD

Differential Revision: https://reviews.llvm.org/D112095
2021-10-19 17:13:16 -04:00
Bjorn Pettersson 9c44a0996c [SCEV] Fix formatting error introduced by D112080
Accidentally pushed D112080 without this clang-format cleanup.
2021-10-19 21:44:07 +02:00
Bjorn Pettersson 08619006a0 [SCEV] Avoid compile time explosion in ScalarEvolution::isImpliedCond
As seen in PR51869 the ScalarEvolution::isImpliedCond function might
end up spending lots of time when doing the isKnownPredicate checks.

Calling isKnownPredicate for example result in isKnownViaInduction
being called, which might result in isLoopBackedgeGuardedByCond being
called, and then we might get one or more new calls to isImpliedCond.
Even if the scenario described here isn't an infinite loop, using
some random generated C programs as input indicates that those
isKnownPredicate checks quite often returns true. On the other hand,
the third condition that needs to be fulfilled in order to "prove
implications via truncation", i.e. the isImpliedCondBalancedTypes
check, is rarely fulfilled.
I also made some similar experiments to look at how often we would
get the same result when using isKnownViaNonRecursiveReasoning instead
of isKnownPredicate. So far I haven't seen a single case when codegen
is negatively impacted by using isKnownViaNonRecursiveReasoning. On
the other hand, it seems like we get rid of the compile time explosion
seen in PR51869 that way. Hence this patch.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D112080
2021-10-19 21:37:57 +02:00
Philip Reames 0836a1059d Extend transform introduced in D111896 to multiple exits
This is trivial.  It was left out of the original review only because we had multiple copies of the same code in review at the same time, and keeping them in sync was easiest if the structure was kept in sync.
2021-10-19 12:12:19 -07:00
Philip Reames fca0218875 [indvars] Canonicalize exit conditions to unsigned using range info
This patch duplicates a bit of logic we apply to comparisons encountered during the IV users walk to conditions which feed exit conditions. Why? simplifyAndExtend has a very limited list of users it walks. In particular, in the examples is stops at the zext and never visits the icmp. (Because we can't fold the zext to an addrec yet in SCEV.) Being willing to visit when we haven't simplified regresses multiple tests (seemingly because of less optimal results when computing trip counts).

Note that this can be trivially extended to multiple exiting blocks. I'm leaving that to a future patch (solely to cut down on the number of versions of the same code in review at once.)

Differential Revision: https://reviews.llvm.org/D111896
2021-10-19 11:49:12 -07:00
Anna Thomas 9403514e76 [LoopPredication] Calculate profitability without BPI
Using BPI within loop predication is non-trivial because BPI is only
preserved lossily in loop pass manager (one fix exposed by lossy
preservation is up for review at D111448). However, since loop
predication is only used in downstream pipelines, it is hard to keep BPI
from breaking for incomplete state with upstream changes in BPI.
Also, correctly preserving BPI for all loop passes is a non-trivial
undertaking (D110438 does this lossily), while the benefit of using it
in loop predication isn't clear.

In this patch, we rely on profile metadata to get almost similar benefit as
BPI, without actually using the complete heuristics provided by BPI.
This avoids the compile time explosion we tried to fix with D110438 and
also avoids fragile bugs because BPI can be lossy in loop passes
(D111448).

Reviewed-By: asbirlea, apilipenko
Differential Revision: https://reviews.llvm.org/D111668
2021-10-19 14:24:04 -04:00
Arthur Eubanks ac0561ebb7 [Verifier] Add context for assume operand bundles verifier errors
And fix a typo.
2021-10-19 09:52:04 -07:00
Jamie Schmeiser 3af474c0a1 Changes to print-changed classes in preparation for DotCfg change printer
Summary:
Break out non-functional changes to the print-changed classes that are needed
for reuse with the DotCfg change printer in https://reviews.llvm.org/D87202.

Various changes to the change printers to facilitate reuse with the
upcoming DotCfg change printer. This includes changing several of
the classes and their support classes to being templates. Also,
some template parameter names were simplified to avoid confusion
with planned identifiers in the DotCfg change printer to come. A
virtual function in the class for comparing functions was changed
to a lambda. The virtual function same was replaced with calls to
operator==. The only intentional functional change was to add the exe name
as the first parameter to llvm::sys::ExecuteAndWait

Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: aeubanks (Arthur Eubanks)
Differential Revision: https://reviews.llvm.org/D110737
2021-10-19 10:58:40 -04:00
David Sherwood 5ea35791e6 [AArch64] Split out processor/tuning features
Following on from an earlier patch that introduced support for -mtune
for AArch64 backends, this patch splits out the tuning features
from the processor features. This gives us the ability to enable
architectural feature set A for a given processor with "-mcpu=A"
and define the set of tuning features B with "-mtune=B".

It's quite difficult to write a test that proves we select the
right features according to the tuning attribute because most
of these relate to scheduling. I have created a test here:

  CodeGen/AArch64/misched-fusion-addr-tune.ll

that demonstrates the different scheduling choices based upon
the tuning.

Differential Revision: https://reviews.llvm.org/D111551
2021-10-19 15:18:55 +01:00
David Sherwood 607fb1bb8c [AArch64] Always add -tune-cpu argument to -cc1 driver
This patch ensures that we always tune for a given CPU on AArch64
targets when the user specifies the "-mtune=xyz" flag. In the
AArch64Subtarget if the tune flag is unset we use the CPU value
instead.

I've updated the release notes here:

  llvm/docs/ReleaseNotes.rst

and added tests here:

  clang/test/Driver/aarch64-mtune.c

Differential Revision: https://reviews.llvm.org/D110258
2021-10-19 14:57:51 +01:00
Simon Pilgrim 71e39e3f18 [ADT] Add APInt::isNegatedPowerOf2() helper
Inspired by D111968, provide a isNegatedPowerOf2() wrapper instead of obfuscating code with (-Value).isPowerOf2() patterns, which I'm sure are likely avenues for typos.....

Differential Revision: https://reviews.llvm.org/D111998
2021-10-19 14:38:21 +01:00
Jeremy Morse 849b17949f [DebugInfo][InstrRef] Avoid un-necessary densemap copies and comparisons
This is purely a performance patch: InstrRefBasedLDV used to use three
DenseMaps to store variable values, two for long term storage and one as a
working set. This patch eliminates the working set, and updates the long
term storage in place, thus avoiding two DenseMap comparisons and two
DenseMap assignments, which can be expensive.

Differential Revision: https://reviews.llvm.org/D111716
2021-10-19 11:10:14 +01:00
Lasse Folger 134e1817f6 [lldb] change name demangling to be consistent between windows and linx
When printing names in lldb on windows these names contain the full type information while on linux only the name is contained.

This change introduces a flag in the Microsoft demangler to control if the type information should be included.
With the flag enabled demangled name contains only the qualified name, e.g:
without flag -> with flag
int (*array2d)[10] -> array2d
int (*abc::array2d)[10] -> abc::array2d
const int *x -> x

For globals there is a second inconsistency which is not yet addressed by this change. On linux globals (in global namespace) are prefixed with :: while on windows they are not.

Reviewed By: teemperor, rnk

Differential Revision: https://reviews.llvm.org/D111715
2021-10-19 12:04:37 +02:00
Jeremy Morse cf033bb2d3 [DebugInfo][NFC] Zero-initialize a class field
This field gets assigned when the relevant object starts being used; but it
remains uninitialized beforehand. This risks introducing hard-to-detect
bugs if something changes, so zero-initialize the field.
2021-10-19 10:24:12 +01:00
Lang Hames cc3115cd1d [JITLink][x86-64] Lift GOT, PLT table managers into x86_64.h; reuse for MachO.
This lifts the global offset table and procedure linkage table builders out of
ELF_x86_64.h and into x86_64.h, renaming them with generic names
x86_64::GOTTableBuilder and x86_64::PLTTableBuilder. MachO_x86_64.cpp is updated
to use these classes instead of the older PerGraphGOTAndStubsBuilder tool.
2021-10-18 21:47:24 -07:00
Noah Shutty e678c51177 [Support][ThinLTO] Move ThinLTO caching to LLVM Support library
We would like to move ThinLTO’s battle-tested file caching mechanism to
the LLVM Support library so that we can use it elsewhere in LLVM.

Patch By: noajshu

Differential Revision: https://reviews.llvm.org/D111371
2021-10-18 18:57:25 -07:00
Hsiangkai Wang facff468b6 [RISCV] Reorder the vector register allocation order.
GPR uses argument registers as the first group of registers to allocate.
This patch uses vector argument registers, v8 to v23, as the first group
to allocate.

Differential Revision: https://reviews.llvm.org/D111304
2021-10-19 09:30:13 +08:00
Lang Hames bc03a9c066 Simplify the TableManager class and move it into a public header.
Moves visitEdge into the TableManager derivatives, replacing the fixEdgeKind
methods in those classes. The visitEdge method takes on responsibility for
updating the edge target, as well as its kind.
2021-10-18 18:20:33 -07:00
Anshil Gandhi 0567f03331 [HIP] [AlwaysInliner] Disable AlwaysInliner to eliminate undefined symbols
By default clang emits complete contructors as alias of base constructors if they are the same.
The backend is supposed to emit symbols for the alias, otherwise it causes undefined symbols.
@yaxunl observed that this issue is related to the llvm options `-amdgpu-early-inline-all=true`
and `-amdgpu-function-calls=false`. This issue is resolved by only inlining global values
with internal linkage. The `getCalleeFunction()` in AMDGPUResourceUsageAnalysis also had
to be extended to support aliases to functions. inline-calls.ll was corrected appropriately.

Reviewed By: yaxunl, #amdgpu

Differential Revision: https://reviews.llvm.org/D109707
2021-10-18 16:53:15 -06:00
Simon Pilgrim a83384498b [X86] combineMulToPMADDWD - replace ASHR(X,16) -> LSHR(X,16)
If we're using an ashr to sign-extend the entire upper 16 bits of the i32 element, then we can replace with a lshr. The sign bit will be correctly shifted for PMADDWD's implicit sign-extension and the upper 16 bits are zero so the upper i16 sext-multiply is guaranteed to be zero.

The lshr also has a better chance of folding with shuffles etc.
2021-10-18 22:12:56 +01:00
Arthur Eubanks ecd25edfc5 [InlineCost] Add empty line between call sites when printing inline costs 2021-10-18 13:56:48 -07:00
Craig Topper 1053e0b27c [RISCV] Use a lambda to avoid having the Support library depend on Option library.
RISCVISAInfo::toFeatures needs to allocate strings using
ArgList::MakeArgString, but toFeatures lives in Support and
MakeArgString lives in Option.

toFeature only has one caller, so the simple fix is to have that
caller pass a lamdba that wraps MakeArgString to break the
dependency.

Differential Revision: https://reviews.llvm.org/D112032
2021-10-18 13:39:37 -07:00
Matt Morehouse 431a5d8411 [x86] Implement a tagged-globals backend feature.
The feature tells the backend to allow tags in the upper bits of global
variable addresses.  These tags will be ignored by upcoming CPUs with
the Intel LAM feature but may be used in instrumentation passes (e.g.,
HWASan).

This patch implements the feature by using @GOTPCREL relocations instead
of direct references to the locally defined global.  Thus the full
tagged address can be loaded by a single instruction:
  movq global@GOTPCREL(%rip), %rax

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D111343
2021-10-18 13:31:10 -07:00
Alexandros Lamprineas 04dc68710a [DebugInfo][ARM] Fix incorrect debug information for RWPI accessed globals
When compiling for the RWPI relocation model the debug information is wrong:

* the debug location is described as { DW_OP_addr Var }
  instead of { DW_OP_constNu Var DW_OP_bregX 0 DW_OP_plus }
* the relocation type is R_ARM_ABS32 instead of R_ARM_SBREL32

Differential Revision: https://reviews.llvm.org/D111404
2021-10-18 21:29:46 +01:00
Arthur Eubanks b8ce97372d [NewPM] Add PipelineTuningOption to eagerly invalidate analyses
This trades off more compile time for less peak memory usage. Right now
it invalidates all function analyses after a module->function or
cgscc->function adaptor.

https://llvm-compile-time-tracker.com/compare.php?from=1fb24fe85a19ae71b00875ff6c96ef1831dcf7e3&to=cb28ddb063c87f0d5df89812ab2de9a69dd276db&stat=instructions
https://llvm-compile-time-tracker.com/compare.php?from=1fb24fe85a19ae71b00875ff6c96ef1831dcf7e3&to=cb28ddb063c87f0d5df89812ab2de9a69dd276db&stat=max-rss

For now this is just experimental.

See comments on why this may affect optimizations.

Reviewed By: asbirlea, nikic

Differential Revision: https://reviews.llvm.org/D111575
2021-10-18 13:20:35 -07:00
Alexey Bataev b9cfa016da [SLP]Fix emission of the shrink shuffles.
Need to follow the order of the reused scalars from the
ReuseShuffleIndices mask rather than rely on the natural order.

Differential Revision: https://reviews.llvm.org/D111898
2021-10-18 13:13:12 -07:00
modimo 313c657fce [InlineAdvisor] Add -inline-replay-scope=<Function|Module> to control replay scope
The goal is to allow grafting an inline tree from Clang or GCC into a new compilation without affecting other functions. For GCC, we're doing this by extracting the inline tree from dwarf information and generating the equivalent remarks.

This allows easier side-by-side asm analysis and a trial way to see if a particular inlining setup provides benefits by itself.

Testing:
ninja check-all

Reviewed By: wenlei, mtrofin

Differential Revision: https://reviews.llvm.org/D110658
2021-10-18 13:08:39 -07:00
Nikita Popov 54d868991a [ExpandMemCmp] Update CFG before DTU
The applyUpdates() API requires that the CFG is already updated,
so make sure to insert the new terminator first.
2021-10-18 21:49:47 +02:00
Petr Hosek 8e46e34d24 Revert "[Support][ThinLTO] Move ThinLTO caching to LLVM Support library"
This reverts commit 92b8cc52bb since
it broke the gold plugin.
2021-10-18 12:24:05 -07:00
Noah Shutty 92b8cc52bb [Support][ThinLTO] Move ThinLTO caching to LLVM Support library
We would like to move ThinLTO’s battle-tested file caching mechanism to
the LLVM Support library so that we can use it elsewhere in LLVM.

Patch By: noajshu

Differential Revision: https://reviews.llvm.org/D111371
2021-10-18 12:08:49 -07:00
Yonghong Song f4a8526cc4 [NFC][BPF] fix comments and rename functions related to BTF_KIND_DECL_TAG
There are no functionality change.
Fix some comments and rename processAnnotations() to
processDeclAnnotations() to avoid confusion when later
BTF_KIND_TYPE_TAG is introduced (https://reviews.llvm.org/D111199).
2021-10-18 10:43:45 -07:00
Jon Roelofs 1300677f97 [AArch64][GlobalISel] combine and + [la]sr => ubfx
https://godbolt.org/z/h8ejrG4hb

rdar://83597585

Differential Revision: https://reviews.llvm.org/D111839
2021-10-18 10:33:01 -07:00
Yonghong Song e9e4fc0fd3 BPF: fix a bug in IRPeephole pass
Commit 009f3a89d8 ("BPF: remove intrindics @llvm.stacksave()
and @llvm.stackrestore()") implemented IRPeephole pass to remove
llvm.stacksave()/stackrestore() instrinsics.
Buildbot reported a failure:
  UNREACHABLE executed at ../lib/IR/LegacyPassManager.cpp:1445!
which is:
  llvm_unreachable("Pass modifies its input and doesn't report it");

The code has changed but the implementation didn't return true
for changing. This patch fixed this problem.
2021-10-18 10:18:24 -07:00
Florian Hahn e844f05397
[LoopUtils] Simplify addRuntimeCheck to return a single value.
This simplifies the return value of addRuntimeCheck from a pair of
instructions to a single `Value *`.

The existing users of addRuntimeChecks were ignoring the first element
of the pair, hence there is not reason to track FirstInst and return
it.

Additionally all users of addRuntimeChecks use the second returned
`Instruction *` just as `Value *`, so there is no need to return an
`Instruction *`. Therefore there is no need to create a redundant
dummy `and X, true` instruction any longer.

Effectively this change should not impact the generated code because the
redundant AND will be folded by later optimizations. But it is easy to
avoid creating it in the first place and it allows more accurately
estimating the cost of the runtime checks.
2021-10-18 18:03:09 +01:00
Craig Topper 84d9bc51a3 [RISCV] Rewrite forwardCopyWillClobberTuple to not assume that there are exactly 32 registers. NFC
This function was copied from ARM where register pairs/triples/quads can wrap around the 32 encoding space. So register 31 can pair with register 0. This is not true for RISCV vectors. The spec specifically mentions the possibility of a future encoding that has more than 32 registers.

This patch removes the modulo from the code and directly checks that destination register is in the source register range and not the beginning of the range. Though I don't expect an identity copy will occur.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D111467
2021-10-18 09:57:38 -07:00
Yonghong Song 009f3a89d8 BPF: remove intrindics @llvm.stacksave() and @llvm.stackrestore()
Paul Chaignon reported a bpf verifier failure ([1]) due to using
non-ABI register R11. For the test case, llvm11 is okay while
llvm12 and later generates verifier unfriendly code.

The failure is related to variable length array size.
The following mimics the variable length array definition
in the test case:

struct t { char a[20]; };
void foo(void *);
int test() {
   const int a = 8;
   char tmp[AA + sizeof(struct t) + a];
   foo(tmp);
   ...
}

Paul helped bisect that the following llvm commit is
responsible:

552c6c2328 ("PR44406: Follow behavior of array bound constant
              folding in more recent versions of GCC.")

Basically, before the above commit, clang frontend did constant
folding for array size "AA + sizeof(struct t) + a" to be 68,
so used alloca for stack allocation. After the above commit,
clang frontend didn't do constant folding for array size
any more, which results in a VLA and llvm.stacksave/llvm.stackrestore
is generated.

BPF architecture API does not support stack pointer (sp) register.
The LLVM internally used R11 to indicate sp register but it should
not be in the final code. Otherwise, kernel verifier will reject it.

The early patch ([2]) tried to fix the issue in clang frontend.
But the upstream discussion considered frontend fix is really a
hack and the backend should properly undo llvm.stacksave/llvm.stackrestore.
This patch implemented a bpf IR phase to remove these intrinsics
unconditionally. If eventually the alloca can be resolved with
constant size, r11 will not be generated. If alloca cannot be
resolved with constant size, SelectionDag will complain, the same
as without this patch.

 [1] https://lore.kernel.org/bpf/20210809151202.GB1012999@Mem/
 [2] https://reviews.llvm.org/D107882

Differential Revision: https://reviews.llvm.org/D111897
2021-10-18 09:51:19 -07:00
Kazu Hirata 8568ca789e Use llvm::erase_if (NFC) 2021-10-18 09:33:42 -07:00
Kirill Stoimenov 62627c7217 [Sanitizers] Replaced getMaxPointerSizeInBits with getPointerSizeInBits, which was causing failures for 32bit x86.
Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D111829
2021-10-18 09:31:14 -07:00
Mircea Trofin 8612b47a8e [NFC] ProfileSummary: const a bunch of members and fields.
It helps readability and maintainability (don't need to chase down
writes to a field I see is const, for example)
2021-10-18 08:55:06 -07:00
Gil Rapaport 1156bd4fc3 [LV] Record memory widening decisions (NFCI)
Record widening decisions for memory operations within the planned recipes and
use the recorded decisions in code-gen rather than querying the cost model.

Differential Revision: https://reviews.llvm.org/D110479
2021-10-18 18:03:35 +03:00
Jessica Clarke f5755c0849 [Mips] Add glue between CopyFromReg, CopyToReg and RDHWR nodes for TLS
The MIPS ABI requires the thread pointer be accessed via rdhwr $3, $r29.
This is currently represented by (CopyToReg $3, (RDHWR $29)) followed by
a (CopyFromReg $3). However, there is no glue between these, meaning
scheduling can break those apart. In particular, PR51691 is a report
where PseudoSELECT_I was moved to between the CopyToReg and CopyFromReg,
and since its expansion uses branches, it split the def and use of the
physical register between two basic blocks, resulting in the def being
eliminated and the use having no def. It also seems possible that a
similar situation could arise splitting up the CopyToReg from the RDHWR,
causing the RDHWR to use a destination register other than $3, violating
the ABI requirement.

Thus, add glue between all three nodes to ensure they aren't split up
during instruction selection. No regression test is added since any test
would be implictly relying on specific scheduling behaviour, so whilst
it might be testing that glue is preventing reordering today, changes to
scheduling behaviour could result in the test no longer being able to
catch a regression here, as the reordering might no longer happen for
other unrelated reasons.

Fixes PR51691.

Reviewed By: atanasyan, dim

Differential Revision: https://reviews.llvm.org/D111967
2021-10-18 15:10:20 +01:00
Kerry McLaughlin ac4e01ea0e [SVE][CodeGen] Fix predicate for add/sub + element count patterns
The patterns added in D111441 should use the HasSVEorStreamingSVE
predicate. This changes one incorrect use of HasSVE with the new
patterns.
2021-10-18 14:42:29 +01:00
Andrew Wei f5056c8c16 [AArch64] Improve shuffle vector by using wider types
Try to widen element type to get a new mask value for a better permutation
sequence, so that we can use NEON shuffle instructions, such as zip1/2,
UZP1/2, TRN1/2, REV, INS, etc.
For example:
  shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 6, i32 7, i32 2, i32 3>
is equivalent to:
  shufflevector <2 x i64> %a, <2 x i64> %b, <2 x i32> <i32 3, i32 1>
Finally, we can get:
  mov     v0.d[0], v1.d[1]

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D111619
2021-10-18 21:24:45 +08:00
Sanjay Patel 2a3cc4d461 [Analysis] add utility function for unary shuffle mask creation
This is NFC-intended for the callers. Posting in case there are
other potential users that I missed.
I would also use this from VectorCombine in a patch for:
https://llvm.org/PR52178 ( D111901 )

Differential Revision: https://reviews.llvm.org/D111891
2021-10-18 09:00:39 -04:00
Simon Pilgrim f041338153 [X86][Costmodel] Add SSE2 sub-128bit vXi32/f32 stride 2 interleaved store costs
Differential Revision: https://reviews.llvm.org/D111941
2021-10-18 13:46:10 +01:00
Simon Pilgrim c850d5c5c8 [X86][Costmodel] Add SSE2 sub-128bit vXi8/16 stride 2 interleaved store costs
Differential Revision: https://reviews.llvm.org/D111941
2021-10-18 13:15:14 +01:00
Stephen Tozer b9ca73e1a8 [DebugInfo] Correctly handle arrays with 0-width elements in GEP salvaging
Fixes an issue where GEP salvaging did not properly account for GEP
instructions which stepped over array elements of width 0 (effectively a
no-op). This unnecessarily produced long expressions by appending
`... + (x * 0)` and potentially extended the number of SSA values used
in the dbg.value. This also erroneously triggered an assert in the
salvage function that the element width would be strictly positive.
These issues are resolved by simply ignoring these useless operands.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D111809
2021-10-18 12:01:12 +01:00
Peter Waller c4603a8a43 [InstCombine][DebugInfo] Remove superflous assertion, add test
When this code was added, an unnecessary assertion slipped in which we
now hit in real code.

Add a test to defend against it firing again.
2021-10-18 11:00:01 +00:00
Jay Foad d55db4b033 [AMDGPU] Remove unused VirtRegMap analysis. NFC. 2021-10-18 11:55:40 +01:00
Max Kazantsev baad10c09e Revert "[NFC] [LoopPeel] Change the way DT is updated for loop exits"
This reverts commit fa16329ae0.

See comments in discussion. Merged by mistake, not entirely getting what
the problem was.
2021-10-18 17:14:11 +07:00
Jay Foad a129932b0d [AMDGPU] Add link to bug 2021-10-18 10:33:42 +01:00
Jeremy Morse ea970661dc Fix signed/unsigned comparison after b5426ced71
gcc11 warns that this counter causes a signed/unsigned comaprison when it's
later compared with a SmallVector::difference_type. gcc appears to be
correct, clang does not warn one way or the other.
2021-10-18 10:28:52 +01:00
Jay Foad 012248b0bc Remove the verifyAfter mechanism that was replaced by D111397
Differential Revision: https://reviews.llvm.org/D111872
2021-10-18 10:26:46 +01:00
Jay Foad 36deb9a670 Add new MachineFunction property FailsVerification
TargetPassConfig::addPass takes a "bool verifyAfter" argument which lets
you skip machine verification after a particular pass. Unfortunately
this is used in generic code in TargetPassConfig itself to skip
verification after a generic pass, only because some previous target-
specific pass damaged the MIR on that specific target. This is bad
because problems in one target cause lack of verification for all
targets.

This patch replaces that mechanism with a new MachineFunction property
called "FailsVerification" which can be set by (usually target-specific)
passes that are known to introduce problems. Later passes can reset it
again if they are known to clean up the previous problems.

Differential Revision: https://reviews.llvm.org/D111397
2021-10-18 10:26:46 +01:00
Piotr Sobczak d869921004 [AMDGPU] Add patterns for i8/i16 local atomic load/store
Add patterns for i8/i16 local atomic load/store.

Added tests for new patterns.

Copied atomic_[store/load]_local.ll to GlobalISel directory.

Differential Revision: https://reviews.llvm.org/D111869
2021-10-18 11:23:10 +02:00
Fraser Cormack 3d850d03ae [SelectionDAG] Fix illegal widening of scalable-vector loads
The process of widening simple vector loads attempts to use a load of a
wider vector type if the original load is sufficiently aligned to avoid
memory faults.

However this optimization is only legal when performed on fixed-length
vector types. For scalable vector types this is invalid (unless vscale
happens to be 1).

This patch does increase the likelihood of compiler crashes (from
`FindMemType` failing to find a suitable type) but this now better
matches how widening non-simple loads, insufficiently-aligned loads, and
scalable-vector stores are handled.

Patches will be introduced later by which loads and stores can be
widened on targets with support for masked or predicated operations.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D111885
2021-10-18 10:00:00 +01:00
Luo, Yuanke 942536ac08 [X86] Prefer VEX encoding in X86 assembler.
This patch is to order the AVX instructions ahead of AVX512 instructions
in the matching table so that the AVX instructions can be matched first.
Thanks Craig and Shengchen for the idea.

Differential Revision: https://reviews.llvm.org/D111538
2021-10-18 16:54:11 +08:00
Stanislav Mekhanoshin 7cdb1df8c7 [AMDGPU] Divergence driven selection for fused bitlogic
The change adds divergence predicates for fused logical operations.
The problem with selecting a scalar fused op such as S_NOR_B32 is
that it does not have a VALU counterpart and will be split in
moveToVALU. At the same time it prevents selection of a better
opcode on the VALU side (such as V_OR3_B32) which does not have a
counterpart on SALU side.

XNOR opcodes are left as is and selected as scalar to get advantage
of the SIInstrInfo::lowerScalarXnor() code which can commute
operations to keep one of two opcodes on SALU if possible. See
xnor.ll test for this.

Differential Revision: https://reviews.llvm.org/D111907
2021-10-18 01:44:25 -07:00
Raphael Isemann de4d2f80b7 Fix cyclic header dependency between Support<->Option due to RISCVISAInfo
This was introduced in D105168 which added RISCVISAInfo.h.
2021-10-18 10:06:11 +02:00
Jingu Kang 3f0b178de2 [AArch64] Fixed a bug on AArch64MIPeepholeOpt
Create new virtual register for the definition of new AND instruction and
replace old register by the new one to keep SSA form.

Differential Revision: https://reviews.llvm.org/D109963
2021-10-18 08:55:42 +01:00
Bing1 Yu f383c53311 [MachineSink] Compile time improvement for large testcases which has many kill flags
We did a experiment and observed dramatic decrease on compilation time which spent on clearing kill flags.
Before:
Number of BasicBlocks:33357
Number of Instructions:162067
Number of Cleared Kill Flags:32869
Time of handling kill flags(ms):1.607509e+05

After:
Number of BasicBlocks:33357
Number of Instructions:162067
Number of Cleared Kill Flags:32869
Time of handling kill flags:3.987371e+03

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D111688
2021-10-18 15:44:07 +08:00
Qiu Chaofan 67c64d8337 [PowerPC] Implement scheduling model for Power10
Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D110855
2021-10-18 15:27:49 +08:00
Max Kazantsev fa16329ae0 [NFC] [LoopPeel] Change the way DT is updated for loop exits
When peeling a loop, we assume that the latch has a `br` terminator and
that all loop exits are either terminated with an `unreachable` or have
a terminating deoptimize call. So when we peel off the 1st iteration, we
change the IDom of all loop exits to the peeled copy of
`NCD(IDom(Exit), Latch)`. This works now, but if we add logic to support
loops with exits that are followed by a block with an `unreachable` or a
terminating deoptimize call, changing the exit's idom wouldn't be enough
and DT would be broken.

For example, let `Exit1` and `Exit2` are loop exits, and each of them
unconditionally branches to the same `unreachable` terminated block. So
neither of the exits dominates this unreachable block. If we change the
IDoms of the exits to some peeled loop block, we don't update the
dominators of the unreachable block. Currently we just don't get to the
peeling logic, saying that we can't peel such loops.

With this NFC we just insert edges from cloned exiting blocks to their
exits after peeling each iteration (we accumulate the insertion updates
and then after peeling apply the updates to DT).

This patch was a part of D110922.

Patch by Dmitry Makogon!

Differential Revision: https://reviews.llvm.org/D111611
Reviewed By: mkazantsev
2021-10-18 10:23:05 +07:00
Simon Pilgrim 0bb32b1b21 [X86][SLM] Fix BitTest+Set uops + port usage
Both ports are required for BitTest ops. Update the uops counts + port usage based off the most recent llvm-exegesis captures and what Intel AoM / Agner reports as well.
2021-10-17 18:13:15 +01:00
Simon Pilgrim 5ed5df4802 [X86][SLM] Fix uops for PCMPISTR/PCMPISTR instructions
Based off a recent llvm-exegesis capture and what Intel AoM / Agner reports as well.
2021-10-17 18:13:14 +01:00
Simon Pilgrim 680afaaa5d [X86][SLM] Fix uops for PCLMULQDQ
Based off a recent llvm-exegesis capture and what Intel AoM / Agner reports as well.
2021-10-17 18:13:14 +01:00
Simon Pilgrim 498c7236bc [X86][SLM] +1uop for PSHUFBrm xmm
Extra 1uop for folded pshufb ops, based off a recent llvm-exegesis capture and what Intel AoM / Agner reports as well.
2021-10-17 18:13:14 +01:00
Nikita Popov 274b2439f8 [ConstantRange] Add fast signed multiply
The multiply() implementation is very slow -- it performs six
multiplications in double the bitwidth, which means that it will
typically work on allocated APInts and bypass fast-path
implementations. Add an additional implementation that doesn't
try to produce anything better than a full range if overflow is
possible. At least for the BasicAA use-case, we really don't care
about more precise modeling of overflow behavior. The current
use of multiply() is fine while the implementation is limited to
a single index, but extending it to the multiple-index case makes
the compile-time impact untenable.
2021-10-17 16:41:49 +02:00
Roman Lebedev 91373bf12e
[X86][Costmodel] Load/store i64 Stride=4 VF=16 interleaving costs
A few more tuples are being queried after D111546. Might be good to model them,
They all require a lot of manual assembly surgery.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/9bnKrefcG - for intels `Block RThroughput: =40.0`; for ryzens, `Block RThroughput: =16.0`
So could pick cost of `40`

For store we have:
https://godbolt.org/z/5s3s14dEY - for intels `Block RThroughput: =40.0`; for ryzens, `Block RThroughput: =16.0`
So we could pick cost of `40`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111945
2021-10-17 17:28:10 +03:00
Roman Lebedev 3274ce3a28
[X86][Costmodel] Load/store i64 Stride=2 VF=32 interleaving costs
A few more tuples are being queried after D111546. Might be good to model them,
They all require a lot of manual assembly surgery.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/MTaKboejM - for intels `Block RThroughput: =32.0`; for ryzens, `Block RThroughput: <=16.0`
So could pick cost of `32`

For store we have:
https://godbolt.org/z/v7xPj3Wd4 - for intels `Block RThroughput: =32.0`; for ryzens, `Block RThroughput: <=32.0`
So we could pick cost of `32`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111944
2021-10-17 17:28:10 +03:00
Roman Lebedev 3a6a9f74d3
[X86][Costmodel] Load/store i32 Stride=4 VF=32 interleaving costs
A few more tuples are being queried after D111546. Might be good to model them,
They all require a lot of manual assembly surgery.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/11rcvdreP - for intels `Block RThroughput: <=68.0`; for ryzens, `Block RThroughput: <=48.0`
So could pick cost of `68`

For store we have:
https://godbolt.org/z/6aM11fWcP - for intels `Block RThroughput: <=64.0`; for ryzens, `Block RThroughput: <=32.0`
So we could pick cost of `64`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111943
2021-10-17 17:28:09 +03:00
Roman Lebedev 4b76a74b42
[X86][Costmodel] Load/store i32 Stride=3 VF=32 interleaving costs
A few more tuples are being queried after D111546. Might be good to model them,
They all require a lot of manual assembly surgery.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/s5b6E6jsP - for intels `Block RThroughput: <=32.0`; for ryzens, `Block RThroughput: <=24.0`
So could pick cost of `32`

For store we have:
https://godbolt.org/z/efh99d93b - for intels `Block RThroughput: <=48.0`; for ryzens, `Block RThroughput: <=32.0`
So we could pick cost of `48`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111942
2021-10-17 17:28:09 +03:00
Roman Lebedev 887acf6842
[X86][Costmodel] Load/store i16 Stride=6 VF=32 interleaving costs
A few more tuples are being queried after D111546. Might be good to model them,
They all require a lot of manual assembly surgery.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/YTeT9M7fW - for intels `Block RThroughput: <=212.0`; for ryzens, `Block RThroughput: <=64.0`
So could pick cost of `212`

For store we have:
https://godbolt.org/z/vc954KEGP - for intels `Block RThroughput: <=90.0`; for ryzens, `Block RThroughput: <=24.0`
So we could pick cost of `90`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111940
2021-10-17 17:28:09 +03:00
David Truby 2e0fb007d6 [llvm][AArch64][SVE] Fold literals into math instructions
SVE has predicated literal forms of some instructions for specific
literals, which currently are generated correctly when using ACLE
but not when those instructions are generated directly.

This adds the patterns to generate those instructions when
generating from standard LLVM IR instructions.

Differential Revision: https://reviews.llvm.org/D99074
2021-10-17 10:57:04 +00:00
Kito Cheng ff13189c5d [RISCV] Unify the arch string parsing logic to to RISCVISAInfo.
How many place you need to modify when implementing a new extension for RISC-V?

At least 7 places as I know:

- Add new SubtargetFeature at RISCV.td
- -march parser in RISCV.cpp
- RISCVTargetInfo::initFeatureMap@RISCV.cpp for handling feature vector.
- RISCVTargetInfo::getTargetDefines@RISCV.cpp for pre-define marco.
- Arch string parser for ELF attribute in RISCVAsmParser.cpp
- ELF attribute emittion in RISCVAsmParser.cpp, and make sure it's in
  canonical order...
- ELF attribute emittion in RISCVTargetStreamer.cpp, and again, must in
  canonical order...

And now, this patch provide an unified infrastructure for handling (almost)
everything of RISC-V arch string.

After this patch, you only need to update 2 places for implement an extension
for RISC-V:
- Add new SubtargetFeature at RISCV.td, hmmm, it's hard to avoid.
- Add new entry to RISCVSupportedExtension@RISCVISAInfo.cpp or
  SupportedExperimentalExtensions@RISCVISAInfo.cpp .

Most codes are come from existing -march parser, but with few new feature/bug
fixes:
- Accept version for -march, e.g. -march=rv32i2p0.
- Reject version info with `p` but without minor version number like `rv32i2p`.

Differential Revision: https://reviews.llvm.org/D105168
2021-10-17 16:25:23 +08:00
Ben Shi d0dbc991c0 Revert "[AArch64] Optimize add/sub with immediate"
This reverts commit 9bf6bef995.
2021-10-16 22:17:18 +00:00
Fangrui Song 8e1d532707 [Object] Simplify RELR decoding 2021-10-16 15:03:14 -07:00
Craig Topper beb7862db5 [X86] Add DAG combine for negation of CMOV absolute value pattern.
This patch detects the absolute value pattern on the RHS of a
subtract. If we find it we swap the CMOV true/false values and
replace the subtract with an ADD.

There may be a more generic way to do this, but I'm not sure.
Targets that don't have legal or custom ISD::ABS use a generic
expand in DAG combiner already when it sees (neg (abs(x))). I
haven't checked what happens if the neg is a more general subtract.

Fixes PR50991 for X86.

Reviewed By: RKSimon, spatel

Differential Revision: https://reviews.llvm.org/D111858
2021-10-16 13:31:43 -07:00
Nikita Popov 492a4a428f [APInt] Fix 1-bit edge case in smul_ov()
The sdiv used to check for overflow can itself overflow if the
LHS is signed min and the RHS is -1. The code tried to account for
this by also checking the commuted version. However, for 1-bit
values, signed min and -1 are the same value, so both divisions
overflow. As such, the overflow for -1 * -1 was not detected
(which results in -1 rather than 1 for 1-bit values). Fix this by
explicitly checking for this case instead.

Noticed while adding exhaustive test coverage for smul_ov(),
which is also part of this commit.
2021-10-16 20:31:04 +02:00
Simon Pilgrim 85b87179f4 [TTI][X86] Add v8i16 -> 2 x v4i16 stride 2 interleaved load costs
Split SSE2 and SSSE3 costs to correctly handle PSHUFB lowering - as was noted on D111938
2021-10-16 17:28:07 +01:00
Simon Pilgrim 6ec644e215 [TTI][X86] Add SSE2 sub-128bit vXi16/32 and v2i64 stride 2 interleaved load costs
These cases use the same codegen as AVX2 (pshuflw/pshufd) for the sub-128bit vector deinterleaving, and unpcklqdq for v2i64.

It's going to take a while to add full interleaved cost coverage, but since these are the same for SSE2 -> AVX2 it should be an easy win.

Fixes PR47437

Differential Revision: https://reviews.llvm.org/D111938
2021-10-16 16:21:45 +01:00
Martin Storsjö 6c96ceabaf [Support] Add more Windows error codes to mapWindowsError
Also sort ERROR_BAD_NETPATH correctly.

Compared with the similar error code mapping in
libcxx/src/filesystem/operations.cpp, I'm leaving out
mappings for ERROR_NOT_SAME_DEVICE and ERROR_OPERATION_ABORTED.
They map nicely to std::errc::cross_device_link and
std::errc::operation_canceled, but those aren't available in
llvm::errc, as they aren't available across all platforms.

Also, the libcxx version maps ERROR_INVALID_NAME to
no_such_file_or_directory instead of invalid_argument.

Differential Revision: https://reviews.llvm.org/D111874
2021-10-16 16:14:49 +03:00
Tomasz Miąsko 48ce523a26 [Symbolize] Demangle Rust symbols
Add support for demangling Rust v0 symbols to LLVM symbolizer by reusing
nonMicrosoftDemangle which supports both Itanium and Rust mangling.

Reviewed By: dblaikie, jhenderson

Part of https://reviews.llvm.org/D110664
2021-10-16 13:32:17 +02:00
Tomasz Miąsko 41a6fc8438 [Demangle] Extract nonMicrosoftDemangle from llvm::demangle
Introduce a new demangling function that supports symbols using Itanium
mangling and Rust v0 mangling, and is expected in the near future to
include support for D mangling as well.

Unlike llvm::demangle, the function does not accept extra underscore
decoration. The callers generally know exactly when symbols should
include the extra decoration and so they should be responsible for
stripping it.

Functionally the only intended change is to allow demangling Rust
symbols with an extra underscore decoration through llvm::demangle,
which matches the existing behaviour for Itanium symbols.

Reviewed By: dblaikie, jhenderson

Part of https://reviews.llvm.org/D110664
2021-10-16 13:32:16 +02:00
Simon Pilgrim d464a9d476 [Analysis] Replace assert(isa)/dyn_cast with cast. NFC.
cast<> will perform the assertion for us.

Removes a static analysis null dereference warning.
2021-10-16 11:40:19 +01:00
Simon Pilgrim a1b43d2bc9 [LazyValueInfo] getPredicateAt - remove unnecessary null pointer check. NFC.
We already dereference the CxtI pointer several times before reaching the "if(CxtI)", we have no need to check it again.

Fixes a coverity warning.
2021-10-16 11:20:19 +01:00
Simon Pilgrim c288241795 [ConstantFolding] ConstantFoldScalarCall2 - early-out if getLibFunc fails. NFC. 2021-10-16 11:20:19 +01:00
Simon Pilgrim c18cf10a04 [ConstantFolding] Use getValueAPF const ref value where possible. NFC.
Don't copy the value if we can avoid it.
2021-10-16 11:20:19 +01:00
Simon Pilgrim 76ca0d67ab [ConstantFolding] ConstantFoldScalarCall1 - early-out if getLibFunc fails. NFC. 2021-10-16 11:20:18 +01:00
Roman Lebedev d137f1288e
[X86][LV] X86 does *not* prefer vectorized addressing
And another attempt to start untangling this ball of threads around gather.
There's `TTI::prefersVectorizedAddressing()`hoop, which confusingly defaults to `true`,
which tells LV to try to vectorize the addresses that lead to loads,
but X86 generally can not deal with vectors of addresses,
the only instructions that support that are GATHER/SCATTER,
but even those aren't available until AVX2, and aren't really usable until AVX512.

This specializes the hook for X86, to return true only if we have AVX512 or AVX2 w/ fast gather.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111546
2021-10-16 12:32:18 +03:00
Ben Shi 9bf6bef995 [AArch64] Optimize add/sub with immediate
Optimize ([add|sub] r, imm) -> ([ADD|SUB] ([ADD|SUB] r, #imm0, lsl #12), #imm1),
if imm == (imm0<<12)+imm1. and both imm0 and imm1 are non-zero 12-bit unsigned
integers.

Optimize ([add|sub] r, imm) -> ([SUB|ADD] ([SUB|ADD] r, #imm0, lsl #12), #imm1),
if imm == -(imm0<<12)-imm1, and both imm0 and imm1 are non-zero 12-bit unsigned
integers.

Reviewed By: jaykang10, dmgreen

Differential Revision: https://reviews.llvm.org/D111034
2021-10-16 08:50:39 +00:00
Zhi An Ng da07942834 [WebAssembly] Add prototype relaxed laneselect instructions
Add i8x16, i16x8, i32x4, i64x2 laneselect instructions. These are only
exposed as builtins, and require user opt-in.
2021-10-15 17:45:09 -07:00
Anshil Gandhi 1830ec94ac Revert "[HIP] [AlwaysInliner] Disable AlwaysInliner to eliminate undefined symbols"
This reverts commit 03375a3fb3.
2021-10-15 16:16:18 -06:00
Nikita Popov 587493b441 [ConstantRange] Compute precise shl range for single elements
For the common case where the shift amount is constant (a single
element range) we can easily compute a precise range (up to
unsigned envelope), so do that.
2021-10-15 23:44:41 +02:00
Sanjay Patel a49f5386ce [InstCombine] generalize fold for mask-with-signbit-splat, part 2
This removes an over-specified fold. The more general transform
was added with:
727e642e97

There's a difference on an existing test that shows a potentially
unnecessary use limit on an icmp fold.

That fold is in InstCombinerImpl::foldICmpSubConstant(), and IIRC
there was some back-and-forth on it and similar folds because they
could cause analysis/passes (SCEV, LSR?) to miss optimizations.

Differential Revision: https://reviews.llvm.org/D111410
2021-10-15 17:11:29 -04:00
Sanjay Patel 727e642e97 [InstCombine] generalize fold for mask-with-signbit-splat
(iN X s>> (N-1)) & Y --> (X < 0) ? Y : 0

https://alive2.llvm.org/ce/z/qeYhdz

I was looking at a missing abs() transform and found my way to this
generalization of an existing fold that was added with D67799.
As discussed in that review, we want to make sure codegen handles
this difference well, and for all of the targets/types that I
spot-checked, it looks good.

I am leaving the existing fold in place in this commit because
it covers a potentially missing icmp fold, but I plan to remove
that as a follow-up commit as suggested during review.

Differential Revision: https://reviews.llvm.org/D111410
2021-10-15 16:25:48 -04:00
Nikita Popov 0c52c271a5 [BasicAA] Rename ExtendedValue to CastedValue (NFC)
As suggested on D110977, rename ExtendedValue to CastedValue,
because it will contain more than just extensions in the future.
2021-10-15 21:56:54 +02:00
Florian Hahn 4a1d63d7d0
[VectorCombine] Add option to only run scalarization transforms.
This patch adds a pass option to only run transforms that scalarize
vector operations and do not create new vector instructions.

When running VectorCombine early in the pipeline introducing new vector
operations can have negative effects, like blocking loop or SLP
vectorization. To avoid regressions, restrict the early VectorCombine
run (when using -enable-matrix) to only perform scalarization and not
introduce new vector operations.

This is done as option to the pass directly, which is then set when
adding the pass to the pipeline. This is done for the new pass manager
only.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D111800
2021-10-15 20:35:58 +01:00
Sam Clegg 659a08399a [WebAssembly] Add import info to `dylink` section of shared libraries
See https://github.com/WebAssembly/tool-conventions/pull/175

Differential Revision: https://reviews.llvm.org/D111345
2021-10-15 11:49:16 -07:00
Mingming Liu cfd155c41b [SelectionDAG] Fix typo in option help
Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D111867
2021-10-15 11:27:40 -07:00
Anshil Gandhi 03375a3fb3 [HIP] [AlwaysInliner] Disable AlwaysInliner to eliminate undefined symbols
By default clang emits complete contructors as alias of base constructors if they are the same.
The backend is supposed to emit symbols for the alias, otherwise it causes undefined symbols.
@yaxunl observed that this issue is related to the llvm options `-amdgpu-early-inline-all=true`
and `-amdgpu-function-calls=false`. This issue is resolved by only inlining global values
with internal linkage. The `getCalleeFunction()` in AMDGPUResourceUsageAnalysis also had
to be extended to support aliases to functions. inline-calls.ll was corrected appropriately.

Reviewed By: yaxunl, #amdgpu

Differential Revision: https://reviews.llvm.org/D109707
2021-10-15 11:39:15 -06:00
Michael Liao bacddf47a8 [amdgpu] Fix a crash case when preserving MDT in SILowerControlFlow
- When a redundant MBB is being erased from MDT, check whether its
  single successor is dominiated by it. If yes, update that successor's
  idom before erasing MBB; otherwise, it implies MBB is a leaf node and
  could be erased directly.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D111831
2021-10-15 13:21:53 -04:00
Jonas Paulsson ccbfcfda1e [SystemZ] Handle huge immediates in SystemZInstrInfo::loadImmediate().
This is needed during isel pseudo expansion in order not to crash on huge
immediates.

Review: Ulrich Weigand
2021-10-15 19:08:45 +02:00
Jessica Paquette 59b94c4a60 NFC: Remove wayward MIR tests from lib/Target
These were put in lib/Target instead of tests.

Thankfully dupes of them already existed in the tests directory.

So, just delete them.
2021-10-15 09:59:00 -07:00
Ellis Hoag aa80034ab9 [DebugInfo] retainedTypes should not have subprograms
After D80369, the retainedTypes in CU's should not have any subprograms
so we should not handle that case when emitting debug info.

Differential Revision: https://reviews.llvm.org/D111593
2021-10-15 12:42:25 -04:00
Stephen Tozer f5ed223b0f [DebugInfo] Limit the size of DIExpressions that we will salvage up to
Fixes: https://bugs.llvm.org/show_bug.cgi?id=51841

This patch places an arbitrary limit on the size of DIExpressions that
we will produce via salvaging, for performance reasons. This helps to
fix a performance issue observed in the bug above, in which debug values
would be salvaged hundreds of times, producing expressions with over
1000 elements and causing the compiler to hang. Limiting the size of
debug values that we will produce to 128 largely fixes this issue.

Reviewed By: dblaikie, jmorse

Differential Revision: https://reviews.llvm.org/D110332
2021-10-15 17:34:21 +01:00
Craig Topper 24703cb6a4 [IR] Fix a few incorrect paths in file header comments. NFC 2021-10-15 09:18:57 -07:00
Abinav Puthan Purayil de3038400b [AMDGPU] Avoid redundant calls to numBits in AMDGPUCodeGenPrepare::replaceMulWithMul24().
The isU24() and isI24() calls numBits to make its decision. This change
replaces them with the internal numBits call so that we can use its
result for the > 32 bit width cases.

Differential Revision: https://reviews.llvm.org/D111864
2021-10-15 19:49:44 +05:30
Dávid Bolvanský 6678db00e6 [X86] Enable promotion of i16 popcnt (PR52056)
Solves https://bugs.llvm.org/show_bug.cgi?id=52056

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111507
2021-10-15 15:41:37 +02:00
Mubashar Ahmad 97809c828f [AArch64]Enabling Cortex-A510 Support
This patch enables support for Cortex-A510 CPUs.

Reviewed By: MarkMurrayARM, dmgreen

Differential Revision: https://reviews.llvm.org/D109825
2021-10-15 14:31:18 +01:00
Abinav Puthan Purayil 0379263f23 [AMDGPU] Fix width check for signed mul24 generation.
This changes fixes a case in which the highest set bit of the original
result is at bit 31 and sign-extending the mul24 for it would make the
result negative.

Differential Revision: https://reviews.llvm.org/D111823
2021-10-15 18:53:41 +05:30
Anton Afanasyev 7b07c01351 [InstCombine] Support arbitrary const shift amount for `lshr (sext i1 ...)`
Add lshr (sext i1 X to iN), C --> select (X, -1 >> C, 0) case. This expands
C == N-1 case to arbitrary C.

Fixes PR52078.

Reviewed By: spatel, RKSimon, lebedev.ri

Differential Revision: https://reviews.llvm.org/D111330
2021-10-15 13:39:13 +03:00
David Green fa1a68285e [AArch64] Improve fptosi.sat vector lowering
Similar to D111236, this improves the lowering of vector fptosi.sat and
fptoui.sat, using legal converts and further saturating from there with
min/max. f64 are excluded for the moment due to producing worse code in
places compared to the unrolling.

Differential Revision: https://reviews.llvm.org/D111787
2021-10-15 11:37:53 +01:00
David Green a4f42a33be [AArch64] Improve fptosi.sat lowering
Improve the lowering of scalar fptosi.sat and fptoui.sat for saturating
widths smaller than legal types by using the fact that the legal type
will saturate under aarch64, and saturating the result further using
min/max.

Differential Revision: https://reviews.llvm.org/D111236
2021-10-15 11:12:23 +01:00
John Brawn 082fa56819 [ARM] Fix MOVCC peephole to not use an incorrect register class
The MOVCC peephole eliminates a MOVCC by making one of its inputs a
conditional instruction, but when doing this it should be using both
inputs of the MOVCC to decide on the register class to use as
otherwise we can get an error when using -verify-machineinstrs.

Differential Revision: https://reviews.llvm.org/D111714
2021-10-15 10:54:26 +01:00
djtodoro 8c3adce81d [JSON] Handle uint64_t type
There was no handling of uint64_t in the LLVM JSON library.
This patch adds support for that. The motivation is
the https://reviews.llvm.org/D109217.

Differential Revision: https://reviews.llvm.org/D109347
2021-10-15 11:18:22 +02:00
guopeilin b092dc0bb9 [AArch64ISelLowering] Avoid duplane in some cases when sve enabled
Reviewed By: david-arm, sdesmalen

Differential Revision: https://reviews.llvm.org/D110524
2021-10-15 15:33:24 +08:00
Jim Lin 2ccdc7315e [RISCV] Add invalid match case for uimm2, uimm3 and uimm7
Reviewed By: craig.topper, jrtc27

Differential Revision: https://reviews.llvm.org/D110308
2021-10-15 14:54:48 +08:00
Ben Shi 4fe5ab4b00 [RISCV] Optimize immediate materialisation with SH*ADD
Use SH1ADD/SH2ADD/SH3ADD along with LUI+ADDI to compose int32*3,
int32*5 and int32*9.

Reviewed By: craig.topper, luismarques

Differential Revision: https://reviews.llvm.org/D111484
2021-10-15 06:46:41 +00:00
Kazu Hirata 81e9c90686 [llvm] Use llvm::is_contained (NFC) 2021-10-14 22:44:09 -07:00
Max Kazantsev 90ae538cab [SCEV] Prove implication of predicates to their sign-flipped counterparts
This patch teaches SCEV two implication rules:

  x <u y && y >=s 0 --> x <s y,
  x <s y && y <s 0 --> x <u y.

And all equivalents with signs/parts swapped.

Differential Revision: https://reviews.llvm.org/D110517
Reviewed By: nikic
2021-10-15 11:49:18 +07:00
Qiu Chaofan 9e9b0f4621 [PowerPC] Support ppc-asm-full-reg-names for AIX
Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D94282
2021-10-15 12:22:44 +08:00
Max Kazantsev 1202d280c6 [SCEV][NFC] Reduce memory footprint & compile time via DFS refactoring
Current implementations of DFS in SCEV check unique-visited of traversed
values on pop, and not on push. As result, the same value may be pushed
multiple times just to be thrown away when popped. These operations are
meaningless and only waste time and increase memory footprint of the
worklist.

This patch reworks the DFS strategy to check uniqueness before push.
Should be NFC.

Differential Revision: https://reviews.llvm.org/D111774
Reviewed By: nikic, reames
2021-10-15 10:19:15 +07:00
Artur Pilipenko 3f96f7b30c Fix getInlineCost with ComputeFullInlineCost enabled
Fix a bug when getInlineCost incorrectly returns a
cost/threshold pair instead of an explicit never inline.

Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D111687
2021-10-14 17:41:41 -07:00
Hongtao Yu 42ad7e1bc9 [CSSPGO] Turn off PseudoProbeUpdatePass for non-FDO builds.
PseudoProbeUpdatePass is used to distribute sample counts among dulplicated probes.  It doesn't make sense for it to run without a sample profile. The pass takes 1% of the build time.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D111847
2021-10-14 17:12:49 -07:00
Alexey Bataev 414abff1fe [SLP]Fix PR52090: clang crashes: Assertion `Index < Length && "Invalid index!"' failed.
Need to check that either Idx is UndefMaskElem and value is UndefValue
or Idx is valid and value is the same as the scalar value in the node.

Differential Revision: https://reviews.llvm.org/D111802
2021-10-14 14:26:29 -07:00
Roman Lebedev 3d7bf6625a
[X86][Costmodel] Improve cost modelling for not-fully-interleaved load
While i've modelled most of the relevant tuples for AVX2,
that only covered fully-interleaved groups.

By definition, interleaving load of stride N means:
load N*VF elements, and shuffle them into N VF-sized vectors,
with 0'th vector containing elements `[0, VF)*stride + 0`,
and 1'th vector containing elements `[0, VF)*stride + 1`.
Example: https://godbolt.org/z/df561Me5E (i64 stride 4 vf 2 => cost 6)

Now, not fully interleaved load, is when not all of these vectors is demanded.
So at worst, we could just pretend that everything is demanded,
and discard the non-demanded vectors. What this means is that the cost
for not-fully-interleaved group should be not greater than the cost
for the same fully-interleaved group, but perhaps somewhat less.
Examples:
https://godbolt.org/z/a78dK5Geq (i64 stride 4 (indices 012u) vf 2 => cost 4)
https://godbolt.org/z/G91ceo8dM (i64 stride 4 (indices 01uu) vf 2 => cost 2)
https://godbolt.org/z/5joYob9rx (i64 stride 4 (indices 0uuu) vf 2 => cost 1)

As we have established over the course of last ~70 patches, (wow)
`BaseT::getInterleavedMemoryOpCos()` is absolutely bogus,
it is usually almost an order of magnitude overestimation,
so i would claim that we should at least use the hardcoded costs
of fully interleaved load groups.

We could go further and adjust them e.g. by the number of demanded indices,
but then i'm somewhat fearful of underestimating the cost.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111174
2021-10-14 23:14:36 +03:00
Craig Topper 79ae9562cc [RISCV] Remove unused member variable. NFC 2021-10-14 12:56:47 -07:00
Nikita Popov 69853f9920 [IVUsers] Move preheader check into SCEVExpander
Rather than checking for loop nest preheaders upfront in IVUsers,
move this requirement into isSafeToExpand() from SCEVExpander.

Historically, LSR did not check whether SCEVs are safe to expand
and fully relied on IVUsers to validate this. Later, support for
non-expandable SCEVs was added via rigid formulas.

Checking this in isSafeToExpand() makes it more obvious what
exactly this check is guarding against, and avoids the awkward
loop nest scan.

This is a followup to https://reviews.llvm.org/D111493#3055286.

Differential Revision: https://reviews.llvm.org/D111681
2021-10-14 21:52:31 +02:00
Craig Topper 3ff9cc01f2 [X86] Use CMOVNS for abs instead of CMOVGE.
CMOVGE reads SF and OF. CMOVNS only reads SF. This matches with
other recent changes to use a single flag where possible. It also
matches gcc codegen.

I believe this technically changes whether the conditioanl move happens
on INT_MIN, but for INT_MIN both registers are the same so it doesn't
matter.

Differential Revision: https://reviews.llvm.org/D111826
2021-10-14 12:28:28 -07:00
Nikita Popov 5f05ff081f [BasicAA] Improve scalable vector handling
Currently, DecomposeGEP() bails out on the whole decomposition if
it encounters a scalable GEP type anywhere. However, it is fine to
still analyze other GEPs that we look through before hitting the
scalable GEP. This does mean that the decomposed GEP base is no
longer required to be the same as the underlying object. However,
I don't believe this property is necessary for correctness anymore.

This allows us to compute slightly more precise aliasing results
for GEP chains containing scalable vectors, though my primary
interest here is simplifying the code.

Differential Revision: https://reviews.llvm.org/D110511
2021-10-14 20:23:50 +02:00
Simon Pilgrim 871f773986 [TTI][X86] Merge getInterleavedMemoryOpCostAVX2 into getInterleavedMemoryOpCost. NFC
This a NFC refactor patch to merge the AVX2 interleaved cost handling back into the getInterleavedMemoryOpCost base method - while getInterleavedMemoryOpCostAVX512 uses instruction and patterns very specific to AVX512+, much of the costs analysis for AVX2 can be reused for all SSE targets.

This is the first step towards improving SSE and AVX1 costs that will reuse the relevant AVX2 costs by splitting some of the tables - for instance AVX1 has very similar costs for most vXi64/vXf64 interleave patterns and many sub-128bit vector costs are the same all the way down to SSE2 (or at least SSSE3).

Differential Revision: https://reviews.llvm.org/D111822
2021-10-14 18:46:25 +01:00
Simon Pilgrim fcbec7e668 [TTI][X86] Swap getInterleavedMemoryOpCostAVX2/getInterleavedMemoryOpCostAVX512 implementations. NFC.
I have some upcoming refactoring for SSE/AVX1 interleaving cost support, and the diff is a lot nicer if the (unaltered) AVX512 implementation isn't stuck between getInterleavedMemoryOpCost and getInterleavedMemoryOpCostAVX2
2021-10-14 18:10:03 +01:00
Simon Pilgrim 13185f0154 [Transforms] eliminateDeadStores - remove unused variable. NFC.
The initial MemoryAccess *Current assignment is never used, and all other uses are initialized/used within the worklist loop (and not across multiple iterations) - so move the variable internal to the loop.

Fixes scan-build unused assignment warning.
2021-10-14 18:10:03 +01:00
Kevin P. Neal 727a891ec8 [FPEnv][InstSimplify] Fold fadd X, 0 ==> X, when we know X is not -0
Currently the fadd optimizations in InstSimplify don't know how to do this
NoSignedZeros "X + 0.0 ==> X" fold when using the constrained intrinsics.
This adds the support.

This review is derived from D106362 with some improvements from D107285
and is a follow-on to D111085.

Differential Revision: https://reviews.llvm.org/D111450
2021-10-14 12:32:45 -04:00
Craig Topper f7ba572483 [RISCV] Update Zba, Zbb, Zbc, and Zbs version from 0.93 to 1.0.
I've removed the Zbs W instructions that are not part of the frozen spec.

References to B as an extension name have been removed. Tests are updated or split accordingly.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D110669
2021-10-14 09:25:03 -07:00
Nikita Popov a8e7d11aca [ValueTracking] Simplify getKnowledgeValidInContext() call (NFC)
This accepts an ArrayRef, there's no need to create a SmallVector.
2021-10-14 18:17:54 +02:00
luxufan 849b36bf6f [JITLink][NFC] Add TableManager to replace PerGraph...Builder pass
This patch add a TableManager which reponsible for fixing edges that need entries to reference the target symbol and constructing such entries.

In the past, the PerGraphGOTAndPLTStubsBuilder pass was used to build GOT and PLT entry, and the PerGraphTLSInfoEntryBuilder pass was used to build TLSInfo entry. By generalizing the behavior of building entry, I added a TableManager which could be reused when built GOT, PLT and TLSInfo entries.

If this patch makes sense and can be accepted, I will apply the TableManager to other targets(MachO_x86_64, MachO_arm64, ELF_riscv), and delete the file PerGraphGOTAndPLTStubsBuilder.h

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D110383
2021-10-14 23:09:53 +08:00
Jeremy Morse b5426ced71 [DebugInfo][InstrRef] Place variable-values PHI using LLVM utilities
This patch is very similar to D110173 / a3936a6c19, but for variable
values rather than machine values. This is for the second instr-ref
problem, calculating the correct variable value on entry to each block.
The previous lattice based implementation was broken; we now use LLVMs
existing PHI placement utilities to work out where values need to merge,
then eliminate un-necessary ones through value propagation.

Most of the deletions here happen in vlocJoin: it was trying to pick a
location for PHIs to happen in, badly, leading to an infinite loop in the
MIR test added, where it would repeatedly switch between register
locations. The new approach is simpler: either PHIs can be eliminated, or
they can't, and the location of the value is a different problem.

Various bits and pieces move to the header so that they can be tested in
the unit tests. The DbgValue class grows a "VPHI" kind to represent
variable value PHIS that haven't been eliminated yet.

Differential Revision: https://reviews.llvm.org/D110630
2021-10-14 14:43:43 +01:00
Brian Cain 743e263e08 [hexagon] Add system register, transfer support
This commit adds the system reg/regpair definitions and the corresponding
register transfer instructions.
2021-10-14 06:37:04 -07:00
Andrew Savonichev 51eefa8164 [NVPTX] Add VRFrame and VRFrameLocal to integer register classes
These registers are used as operands for instructions that expect an
integer register, so they should be added to Int32Regs or Int64Regs
register classes. Otherwise the machine verifier emits an error for
the following LIT tests when LLVM_ENABLE_MACHINE_VERIFIER=1
environment variable is set:

*** Bad machine code: Illegal physical register for instruction ***
- function:    kernel_func
- basic block: %bb.0 entry (0x55c8903d5438)
- instruction: %3:int64regs = LEA_ADDRi64 $vrframelocal, 0
- operand 1:   $vrframelocal
$vrframelocal is not a Int64Regs register.

    CodeGen/NVPTX/call-with-alloca-buffer.ll
    CodeGen/NVPTX/disable-opt.ll
    CodeGen/NVPTX/lower-alloca.ll
    CodeGen/NVPTX/lower-args.ll
    CodeGen/NVPTX/param-align.ll
    CodeGen/NVPTX/reg-types.ll
    DebugInfo/NVPTX/dbg-declare-alloca.ll
    DebugInfo/NVPTX/dbg-value-const-byref.ll

Differential Revision: https://reviews.llvm.org/D110164
2021-10-14 16:19:03 +03:00
Jonas Paulsson c0d88613f2 [SystemZ] Remove some now unused ISD XXX_LOOP opcodes. 2021-10-14 14:55:44 +02:00
Andrew Savonichev dc8a41de34 [ARM] Simplify address calculation for NEON load/store
The patch attempts to optimize a sequence of SIMD loads from the same
base pointer:

    %0 = gep float*, float* base, i32 4
    %1 = bitcast float* %0 to <4 x float>*
    %2 = load <4 x float>, <4 x float>* %1
    ...
    %n1 = gep float*, float* base, i32 N
    %n2 = bitcast float* %n1 to <4 x float>*
    %n3 = load <4 x float>, <4 x float>* %n2

For AArch64 the compiler generates a sequence of LDR Qt, [Xn, #16].
However, 32-bit NEON VLD1/VST1 lack the [Wn, #imm] addressing mode, so
the address is computed before every ld/st instruction:

    add r2, r0, #32
    add r0, r0, #16
    vld1.32 {d18, d19}, [r2]
    vld1.32 {d22, d23}, [r0]

This can be improved by computing address for the first load, and then
using a post-indexed form of VLD1/VST1 to load the rest:

    add r0, r0, #16
    vld1.32 {d18, d19}, [r0]!
    vld1.32 {d22, d23}, [r0]

In order to do that, the patch adds more patterns to DAGCombine:

  - (load (add ptr inc1)) and (add ptr inc2) are now folded if inc1
    and inc2 are constants.

  - (or ptr inc) is now recognized as a pointer increment if ptr is
    sufficiently aligned.

In addition to that, we now search for all possible base updates and
then pick the best one.

Differential Revision: https://reviews.llvm.org/D108988
2021-10-14 15:23:10 +03:00
Simon Pilgrim 88487662f7 [Codegen] TargetLowering::getCanonicalIndexType - early out scaled MVT::i8 indices. NFCI.
Avoids unused assignment scan-build warning.
2021-10-14 13:08:40 +01:00
Simon Pilgrim 77dcdc2f50 [CostModel][X86] Pre-SSE41 targets can use PMADDWD for sext sub-i16 -> i32
Without SSE41 sext/zext instructions the extensions will be split, meaning that the MUL->PMADDWD fold will split the sext_i32(x) into zext_i32(sext_i16(x))
2021-10-14 12:17:40 +01:00
Simon Pilgrim 16729d0f62 [Orc] ELFNixPlatform::setupJITDylib - remove dead return. NFCI.
2 returns, one after the other - reported by coverity
2021-10-14 12:17:40 +01:00
Jeremy Morse e3e1da20d4 Follow up to a3936a6c19, correctly select LiveDebugValues implementation
Some functions get opted out of instruction referencing if they're being
compiled with no optimisations, however the LiveDebugValues pass picks one
implementation and then sticks with it through the rest of compilation.
This leads to a segfault if we encounter a function that doesn't use
instr-ref (because it's optnone, for example), but we've already decided
to use InstrRefBasedLDV which expects to be passed a DomTree.

Solution: keep both implementations around in the pass, and pick whichever
one is appropriate to the current function.
2021-10-14 11:28:53 +01:00
Jonas Paulsson a33e4c8ae9 [SystemZ] Reapply memcmp and memcpy patches.
This reverts 3562076 and includes some refactoring as well.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D111733
2021-10-14 10:37:33 +02:00
Jonas Paulsson 00baad35b2 [SystemZ] Bugfix and refactorization of mem-mem operations
This patch fixes the bug that consisted of treating variable / immediate
length mem operations (such as memcpy, memset, ...) differently. The variable
length case needs to have the length minus 1 passed due to the use of EXRL
target instructions. However, the DAGCombiner can convert a register length
argument into a constant one, and whenever that happened one byte too little
would end up being performed.

This is also a refactorization by reducing the number of opcodes and variants
involved. For any opcode (variable or constant length), only the length minus
one is passed on to the ISD node. The rest of the logic is now instead
handled during isel pseudo expansion.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D111729
2021-10-14 10:37:33 +02:00
Max Kazantsev 6e1308bc10 [SCEV][NFC] Simplify check with CI->isZero() exit condition
Replace check with
    if ((ExitIfTrue && CI->isZero()) || (!ExitIfTrue && CI->isOne()))
with equivalent and simpler version
    if (ExitIfTrue == CI->isZero())
2021-10-14 14:06:52 +07:00
Max Kazantsev 46a1dd47e6 [SCEV][NFC] Reorder checks to delay call of all_of
Check lightweight getter condition before calling all_of.
2021-10-14 13:30:51 +07:00
Ben Shi 7e81526126 [RISCV] Optimize immediate materialisation with BSETI/BCLRI
Opitimize immediate materialisation in the following way if profitable:
1. Use BCLRI for upper 32 bits if the lower 32 bits are negative int32.
2. Use BSETI for upper 32 bits if the lower 32 bits are positive int32.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D111508
2021-10-14 04:56:47 +00:00
Abinav Puthan Purayil b3c9d84e5a [AMDGPU] Fix 24-bit mul intrinsic generation for > 32-bit result.
The 24-bit mul intrinsics yields the low-order 32 bits. We should only
do the transformation if the operands are known to be not wider than 24
bits and the result is known to be not wider than 32 bits.

Differential Revision: https://reviews.llvm.org/D111523
2021-10-14 09:00:19 +05:30
Ben Shi 481db13fec [RISCV] Optimize immediate materialisation with SLLI.UW
Use LUI+SLLI.UW to compose the upper bits instead of LUI+SLLI.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D111705
2021-10-14 02:24:50 +00:00
Lang Hames 4fcc0ac15e [ORC] Use a Setup object for SimpleRemoteEPC construction.
SimpleRemoteEPC notionally allowed subclasses to override the
createMemoryManager and createMemoryAccess methods to use custom objects, but
could not actually be subclassed in practice (The construction process in
SimpleRemoteEPC::Create could not be re-used).

Instead of subclassing, this commit adds a SimpleRemoteEPC::Setup class that
can be used by clients to set up the memory manager and memory access members.
A default-constructed Setup object results in no change from previous behavior
(EPCGeneric* memory manager and memory access objects used by default).
2021-10-13 16:47:00 -07:00
Shoaib Meenai 6404f4b5af [InstCombine] Remove attributes after hoisting free above null check
If the parameter had been annotated as nonnull because of the null
check, we want to remove the attribute, since it may no longer apply and
could result in miscompiles if left. Similarly, we also want to remove
undef-implying attributes, since they may not apply anymore either.

Fixes PR52110.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D111515
2021-10-13 15:34:56 -07:00
Mircea Trofin 6c76d01011 [mlgo][aot] requrie the model is autogenerated for test determinism
The tests that exercise the 'release' mode, where the model is AOT-ed,
check the output has certain properties, to validate that, indeed, a
different policy from the default one was exercised. For determinism, we
can't reliably check that output for an arbitrary learned policy, since
it could be that policy happens to mimic the default one in that
particular case.

This patch adds a requirement that those tests run only when the model
is autogenerated (e.g. on build bots).

Differential Revision: https://reviews.llvm.org/D111747
2021-10-13 14:02:41 -07:00
Philip Reames 47d10b25f8 [instcombine] PRE freeze to only potentially posion/undef operand of phi
This extends the foldOpIntoPhi code used when visiting a freeze user of a phi to allow any non-undef/poison operand as opposed to only non-undef/poison constants.  This lets us hoist a freeze in the increment of an IV into the preheader in many cases.

Differential Revision: https://reviews.llvm.org/D111744
2021-10-13 13:55:54 -07:00
Martin Storsjö 2a4b1539e9 [Support] [Path] Use std::replace instead of an explicit comparison loop. NFC.
After 8fc7a907b9, this loop does
the same as a plain `std::replace`.

Also clarify the comment about what this function does.

Differential Revision: https://reviews.llvm.org/D111730
2021-10-13 22:55:14 +03:00
Roman Lebedev 18eef13dad
[X86][Costmodel] Fix `X86TTIImpl::getGSScalarCost()`
`X86TTIImpl::getGSScalarCost()` has (at least) two issues:
* it naively computes the cost of sequence of `insertelement`/`extractelement`.
  If we are operating not on the XMM (but YMM/ZMM),
  this widely overestimates the cost of subvector insertions/extractions.
* Gather/scatter takes a vector of pointers, and scalarization results in us performing
  scalar memory operation for each of these pointers, but we never account for the cost
  of extracting these pointers out of the vector of pointers.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111222
2021-10-13 22:35:39 +03:00
Sjoerd Meijer 67a58fa3a6 [FuncSpec] Don't run the solver if there's nothing to do
Even if there are no interesting functions, the SCCP solver would still run
before bailing. Now bail earlier, avoid running the solver for nothing.

Differential Revision: https://reviews.llvm.org/D111645
2021-10-13 19:05:19 +01:00
Arthur Eubanks 3628bb7436 Make various assume bundle data structures use uint64_t
Following D110451, we need to make sure to support 64 bit values.
2021-10-13 10:38:41 -07:00
Joe Nash b44eac1b85 [AMDGPU] Remove unneeded emit literal check
NFC. This check does not verify any functional property since size 8
was added. Remove it for simplicity.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D111737

Change-Id: Ifd7cbd324a137f939d8dc04acb8fbd54c9527a42
2021-10-13 12:46:22 -04:00
Kai Nacke 0a950a2e94 [SystemZ/z/OS] Implement save of non-volatile registers on z/OS XPLINK
This PR implements the save of the XPLINK callee-saved registers
on z/OS.

Reviewed By: uweigand, Kai

Differential Revision: https://reviews.llvm.org/D111653
2021-10-13 12:57:57 -04:00
Philip Reames 24c9016574 [instcombine] propagate single use freeze(gep inbounds X)
This is a follow on for D111675 which implements the gep case. I'd originally left it out because I was hoping to actually implement the inrange todo, but after a bit of staring at the code, decided to leave it as is since it doesn't effect this use case (i.e. instcombine requires the op to freeze to be an instruction).

Differential Revision: https://reviews.llvm.org/D111691
2021-10-13 09:25:00 -07:00
Jay Foad c885857e9d [AMDGPU] Enable load clustering in the post-RA scheduler
This has a couple of benefits:
1. It can sometimes fix clusters that got broken apart when the register
   allocator inserted a copy.
2. Post-RA scheduling does not have to worry about increasing register
   pressure, which in some cases gives it more freedom to reorder
   instructions.

Testing on a collection of 10,000 graphics shaders compiled for gfx1010
showed:
- The average length of each run of one or more load instructions
  increased by about 1%.
- The number of runs of two or more load instructions increased by
  about 4%.

Differential Revision: https://reviews.llvm.org/D111646
2021-10-13 17:12:26 +01:00
Jeremy Morse fbf269c71e [DebugInfo][InstrRef] Only calculate IDF for reg units
In D110173 we start using the existing LLVM IDF calculator to place PHIs as
we reconstruct an SSA form of machine-code program. Sadly that's slower
than the old (but broken) way, this patch attempts to recover some of that
performance.

The key observation: every time we def a register, we also have to def it's
register units. If we def'd $rax, in the current implementation we
independently calculate PHI locations for {al, ah, ax, eax, hax, rax}, and
they will all have the same PHI positions. Instead of doing that, we can
calculate the PHI positions for {al, ah} and place PHIs for any aliasing
registers in the same positions. Any def of a super-register has to def
the unit, and vice versa, so this is sound. It cuts down the SSA placement
we need to do significantly.

This doesn't work for stack slots, or registers we only ever read, so place
PHIs normally for those. LiveDebugValues choses to ignore writes to SP at
calls, and now have to ignore writes to SP register units too.

Differential Revision: https://reviews.llvm.org/D111627
2021-10-13 16:08:18 +01:00
Sanjay Patel 02928fcb8c [InstCombine] improve code comments; NFC 2021-10-13 10:40:44 -04:00
Sanjay Patel 905d170803 [InstCombine] allow matching vector splat constants in foldLogOpOfMaskedICmps()
This is NFC-intended for scalar code. There are still unnecessary
m_ConstantInt restrictions in surrounding code, so this is not a
complete fix.

This prevents regressions seen with a planned follow-on to D111410.
2021-10-13 10:15:26 -04:00
Jeremy Morse e845ca2ff1 Follow up a3936a6c19 to work around an old compiler bug
Old versions of gcc want template specialisations to happen within the
namespace where the template lives; this is still present in gcc 5.1, which
we officially support, so it has to be worked around.
2021-10-13 13:27:25 +01:00
Jeremy Morse a3936a6c19 [DebugInfo][InstrRef] Use PHI placement utilities for machine locations
InstrRefBasedLDV used to try and determine which values are in which
registers using a lattice approach; however this is hard to understand, and
broken in various ways. This patch replaces that approach with a standard
SSA approach using existing LLVM utilities. PHIs are placed at dominance
frontiers; value propagation then eliminates un-necessary PHIs.

This patch also adds a bunch of unit tests that should cover many of the
weirder forms of control flow.

Differential Revision: https://reviews.llvm.org/D110173
2021-10-13 12:49:04 +01:00
Kerry McLaughlin 1a2e90199f [SVE][CodeGen] Add patterns for ADD/SUB + element count
This patch adds patterns to match the following with INC/DEC:
 - @llvm.aarch64.sve.cnt[b|h|w|d] intrinsics + ADD/SUB
 - vscale + ADD/SUB

For some implementations of SVE, INC/DEC VL is not as cheap as ADD/SUB and
so this behaviour is guarded by the "use-scalar-inc-vl" feature flag, which for SVE
is off by default. There are no known issues with SVE2, so this feature is
enabled by default when targeting SVE2.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D111441
2021-10-13 11:36:15 +01:00
Simon Pilgrim fb2539b9d8 [X86][SSE] Add X86ISD::AVG to isCommutativeBinOp to support folding shuffles through the binop 2021-10-13 10:54:37 +01:00
Heejin Ahn 9261ee32dc [WebAssembly] Make EH work with dynamic linking
This makes Wasm EH work with dynamic linking. So far we were only able
to handle destructors, which do not use any tags or LSDA info.

1. This uses `TargetExternalSymbol` for `GCC_except_tableN` symbols,
   which points to the address of per-function LSDA info. It is more
   convenient to use than `MCSymbol` because it can take additional
   target flags.

2. When lowering `wasm_lsda` intrinsic, if PIC is enabled, make the
   symbol relative to `__memory_base` and generate the `add` node. If
   PIC is disabled, continue to use the absolute address.

3. Make tag symbols (`__cpp_exception` and `__c_longjmp`) undefined in
   the backend, because it is hard to make it work with dynamic
   linking's loading order. Instead, we make all tag symbols undefined
   in the LLVM backend and import it from JS.

4. Add support for undefined tags to the linker.

Companion patches:
- https://github.com/WebAssembly/binaryen/pull/4223
- https://github.com/emscripten-core/emscripten/pull/15266

Reviewed By: sbc100

Differential Revision: https://reviews.llvm.org/D111388
2021-10-12 23:28:27 -07:00
Lang Hames 447d3017e4 [JITLink][MachO][arm64] Mask high bits out of immediate for LDRLiteral19.
Negative deltas for LDRLiteral19 have their high bits set. If these bits aren't
masked out then they will overwrite other instruction bits, leading to a bogus
encoding.

This long-standing relocation bug was exposed by e50aea58d5, "[JITLink][ORC]
Major JITLinkMemoryManager refactor.", which caused memory layouts to be
reordered, which in turn lead to a previously unseen negative delta. (Unseen
because LDRLiteral19s were only created in JITLink passes where they always
pointed at segments that were layed-out-after in the old layout).

No testcase yet: Our existing regression test infrastructure is good at checking
that operand bits are correct, but provides no easy way to test for bad opcode
bits. I'll have a think about the right way to approach this.

https://llvm.org/PR52153
2021-10-12 22:19:47 -07:00
Visa Hankala a5de04d261 [Support][mips] Remove unnecessary includes from Memory.inc
The mips-specific includes have been unnecessary ever since the
__clear_cache() builtin replaced cacheflush().

Differential Revision: https://reviews.llvm.org/D111486
2021-10-13 07:48:36 +03:00
Philip Reames 4c5702cb12 Fix bug introduced with 6f34839 (poison flags on floating point ops)
The newly introduced API for checking whether poison comes solely from flags which can be dropped was out of sync.  This was noticed by a reviewer post commit.

For the moment, disable the floating point flags.  In a follow up change, I plan to add support in dropPoisonGeneratingFlags, but that deserves to be a change of it's own.
2021-10-12 20:25:00 -07:00
Ben Shi 787eeb8597 [RISCV] Optimize immediate materialisation with BCLRI
Do the following optimization for immediate materialisation:

1. For values in range 0xffffffff 7fffffff ~ 0xffffffff 00000000, first
   generate the lower 32-bit with Val|0x80000000 (which is expected be an
   int32), then emit (BCLRI r, 31).

2. For values in range 0x80000000 ~ 0xffffffff, first generate the lower
   32-bit with Val&~0x80000000 (which is expected to be an int32), then
   emit (BSETI r, 31).

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D111532
2021-10-13 00:59:23 +00:00
Fangrui Song c2d4fe51bb [X86] Remove little support we had for MPX
GCC 9.1 removed Intel MPX support. Linux kernel removed MPX in 2019.
glibc 2.35 will remove MPX.

Our support is limited: we support assembling of bndmov but not bnd.
Just remove it.

Reviewed By: pengfei, skan

Differential Revision: https://reviews.llvm.org/D111517
2021-10-12 16:18:51 -07:00
Philip Reames 6f34839407 [instcombine] propagate freeze through single use poison producing flag instruction
If we have an instruction which produces poison only when flags are specified on the instruction, then we know that freezing the operands and dropping flags is equivalent to freezing the result. If we know those flags don't result in any undefined behavior being executed, then there's no point in preserving the flags as we gain no knowledge by having them.

This patch extends the existing propagation logic which sinks freeze to single potential non-poison operands to allow dropping of flags when we know the freeze is the sole use of the instruction with poison flags.

The main value is that we tend to sink freezes towards the phi in IV cycles where the incoming value to the phi is the freeze of an IV increment. This will in turn (in a future patch), let us fold the freeze through the phi into the loop preheader. Motivated by eliminating need for CanonicalizeFreezeInLoops for the clearly profitable cases from onephi.ll test case in the test directory.

Differential Revision: https://reviews.llvm.org/D111675
2021-10-12 13:52:41 -07:00
Albion Fung b4b9f9b4b3 [PowerPC] Emit dcbt and dcbtst in place of their extended mnemonics on AIX
On AIX, the system assembler does not support the extended mnemonics
dcbtt and dcbtstt. This patch stops them from being emitted on
AIX and emits the base mnemonics instead, dcbt X, X, 16 and
dcbtstt X, X, 16 respectively.

Differential revision: https://reviews.llvm.org/D111258
2021-10-12 15:47:57 -05:00
Lang Hames 2815ed57e3 [ORC] Shut down dispatcher in ExecutorProcessControl implementations.
f341161689 added a task dispatcher for async handlers, but didn't add a
TaskDispatcher::shutdown call to SelfExecutorProcessControl or SimpleRemoteEPC.
This patch adds the missing call, which ensures that we don't destroy the
dispatcher while tasks are still running.

This should fix the use-after-free crash seen in
https://lab.llvm.org/buildbot/#/builders/5/builds/13063
2021-10-12 13:40:15 -07:00
Ayal Zaks 15692fd6b5 [LV] Fix 2nd crash for reverse interleaved groups under mask/fold-tail.
This patch fixes another crash revealed by PR51614:
when *deciding* to vectorize with masked interleave groups, check if the access
is reverse (which is currently not supported).

Differential Revision: https://reviews.llvm.org/D108900
2021-10-12 21:44:42 +03:00
Amara Emerson 5abce56edb [GlobalISel] Add support for constant vector folding of binops in CSEMIRBuilder.
Differential Revision: https://reviews.llvm.org/D111524
2021-10-12 11:31:22 -07:00
Mircea Trofin ea4a6c8426 [Inline] Make sure the InlineAdvisor is correctly cleared.
If another inlining session came after a ModuleInlinerWrapperPass, the
advisor alanysis would still be cached, but its Result would be cleared.
We need to clear both.

This addresses PR52118

Differential Revision: https://reviews.llvm.org/D111586
2021-10-12 10:42:41 -07:00
Roman Lebedev 958da6598f
[X86] `detectAVGPattern()`: don't require zext in the with-constant case 2021-10-12 20:24:17 +03:00
Roman Lebedev a1d57f75d1
[NFC][X86] `detectAVGPattern()`: rely on `AVGSplitter()` to perform truncation 2021-10-12 20:12:09 +03:00
Stanislav Mekhanoshin 9cf995be6b [AMDGPU] Promote generic pointer kernel arguments into global
The new pass walks kernel's pointer arguments, then loads from them.
If a loaded value is a pointer and loaded pointer is unmodified in
the kernel before the load, then promote loaded pointer to global.
Then recursively continue.

Differential Revision: https://reviews.llvm.org/D111464
2021-10-12 10:07:33 -07:00
Sanjay Patel 7a2949647a [InstCombine] propagate no-wrap flag through select-of-mul fold
This may not be obvious, but Alive2 agrees:
https://alive2.llvm.org/ce/z/Ld9qNT

If the mul has "nsw", then -1 * INT_MIN is poison, so the
negate can also have "nsw" because 0 - INT_MIN is poison.

If the mul has "nuw", then that means the "OtherOp" can only
be 0 or 1 (anything else multiplied by 0xfff... would wrap).
So the replacement negate must be "nsw" because it is either
"0-0" or "0-1".

This is another regression noticed with a planned follow-up
to D111410.
2021-10-12 12:57:20 -04:00
Roman Lebedev 5f4f5da634
[X86] `detectAVGPattern()`: support basic case of PAVG chaining (PR52131)
As noted in https://github.com/halide/Halide/pull/6302,
we hilariously fail to match PAVG if we even as much
as look at it the wrong way.

In this particular case, the problem stems from the fact that
`PAVG` root (def) is a `trunc`, and leafs (uses) are `zext`'s,
and InstCombine really loves to get rid of both of these,
for example replace them with a bit mask. So we may not have
said `zext`.

Instead of checking for that + type match,
i think we should rely on the actual active type,
as per the knownbits.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111571
2021-10-12 19:51:43 +03:00
Roman Lebedev 7964c3ed82
[X86] `detectAVGPattern()`: small preparatory NFC refactor 2021-10-12 19:51:43 +03:00
Hongtao Yu 098a0d8fbc [CSSPGO] Unblock optimizations with pseudo probe instrumentation part 3.
This patch continues unblocking optimizations that are blocked by pseudo probe instrumentation.

Not exactly like DbgIntrinsics, PseudoProbe intrinsic has other attributes (such as mayread, maywrite, mayhaveSideEffect) that can block optimizations. The issues fixed are:
- Flipped default param of getFirstNonPHIOrDbg API to skip pseudo probes
- Unblocked CSE by avoiding pseudo probe from clobbering memory SSA
- Unblocked induction variable simpliciation
- Allow empty loop deletion by treating probe intrinsic isDroppable
- Some refactoring.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D110847
2021-10-12 09:44:12 -07:00
Jeremy Morse d9fa186a5c Scatter NDEBUG to fix after 838b4a533e
These "dump" methods call into MachineOperand::dump, which doesn't exist
with NDEBUG, thus we croak. Disable LiveDebugValues dump methods when
NDEBUG is turned on to avoid this.
2021-10-12 17:13:15 +01:00
Jay Foad 66ce1015af Revert "[AMDGPU] Enable load clustering in the post-RA scheduler"
This reverts commit 66e13c7f43.

It was committed by accident.
2021-10-12 16:19:35 +01:00
Jay Foad f7ee21aa32 [TwoAddressInstruction] Remove ad hoc machine verification
With the -early-live-intervals command line flag,
TwoAddressInstructionPass::runOnMachineFunction would call
MachineFunction::verify before returning to check the live intervals.
But there was not much benefit to doing this since -verify-machineinstrs
and LLVM_ENABLE_EXPENSIVE_CHECKS provide a more general way of
scheduling machine verification after every pass.

Also it caused problems on targets like Lanai which are marked as "not
machine verifier clean", since verification would fail for known
target-specific problems which are nothing to do with LiveIntervals.

Differential Revision: https://reviews.llvm.org/D111618
2021-10-12 16:09:18 +01:00
Jay Foad 66e13c7f43 [AMDGPU] Enable load clustering in the post-RA scheduler
This has a couple of benefits:
1. It can sometimes fix clusters that got broken apart when the register
   allocator inserted a copy.
2. Post-RA scheduling does not have to worry about increasing register
   pressure, which in some cases gives it more freedom to reorder
   instructions.

Testing on a collection of 10,000 graphics shaders compiled for gfx1010
showed:
- The average length of each run of one or more load instructions
  increased by about 1%.
- The number of runs of two or more load instructions increased by
  about 4%.
2021-10-12 16:09:04 +01:00
Jeremy Morse 838b4a533e [DebugInfo][NFC] Move LiveDebugValues class to header
This patch shifts the InstrRefBasedLDV class declaration to a header.
Partially because it's already massive, but mostly so that I can start
writing some unit tests for it. This patch also adds the boilerplate for
said unit tests.

Differential Revision: https://reviews.llvm.org/D110165
2021-10-12 16:07:26 +01:00
Bradley Smith 2eb42e3d2a [AArch64][SVE] Add fixed type lowering for EXTRACT_SUBVECTOR
Depends on D111135

Differential Revision: https://reviews.llvm.org/D111165
2021-10-12 14:56:15 +00:00
Simon Pilgrim 61d124f7a2 [X86] Fix implicit MathsExtras.h header dependency 2021-10-12 13:37:16 +01:00
Kerry McLaughlin 1439ef1a3f [LoopVectorize] Classify pointer induction updates as scalar only if they have one use
collectLoopScalars collects pointer induction updates in ScalarPtrs, assuming
that the instruction will be scalar after vectorization. This may crash later
in VPReplicateRecipe::execute() if there there is another user of the instruction
other than the Phi node which needs to be widened.

This changes collectLoopScalars so that if there are any other users of
Update other than a Phi node, it is not added to ScalarPtrs.

Reviewed By: david-arm, fhahn

Differential Revision: https://reviews.llvm.org/D111294
2021-10-12 13:24:49 +01:00
Florian Hahn 40d85f16c4
[LoopPeel] Use any_of & contains instead of for & find.
Using contains was suggested in D108114, but I forgot to include it when
landing the patch.
2021-10-12 12:18:01 +01:00
Sjoerd Meijer fc0fa85171 [FuncSpec] Allow ConstExprs that are function pointers
This is a follow up of D110529 that disallowed constexprs. That change
introduced a regression as this also disallowed constexprs that are function
pointers, which is actually one of the motivating use cases that we do want to
support.

Differential Revision: https://reviews.llvm.org/D111567
2021-10-12 11:44:26 +01:00
Florian Hahn cd0ba9dc58
[LoopPeel] Peel if it turns invariant loads dereferenceable.
This patch adds a new cost heuristic that allows peeling a single
iteration off read-only loops, if the loop contains a load that

    1. is feeding an exit condition,
    2. dominates the latch,
    3. is not already known to be dereferenceable,
    4. and has a loop invariant address.

If all non-latch exits are terminated with unreachable, such loads
in the loop are guaranteed to be dereferenceable after peeling,
enabling hoisting/CSE'ing them.

This enables vectorization of loops with certain runtime-checks, like
multiple calls to `std::vector::at` if the vector is passed as pointer.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D108114
2021-10-12 11:42:28 +01:00
jacquesguan 0608bbd4e8 [RISCV] Rename assembler mnemonic of unordered floating-point reductions for v1.0-rc change
Rename vfredsum and vfwredsum to vfredusum and vfwredusum. Add aliases for vfredsum and vfwredsum.

Reviewed By: luismarques, HsiangKai, khchen, frasercrmck, kito-cheng, craig.topper

Differential Revision: https://reviews.llvm.org/D105690
2021-10-12 06:46:46 +00:00
Lang Hames 5829ba7afc [ORC] More attempts to work around compiler failures.
Commit 731f991cdc seems to have helped, but did not catch all instances (see
https://lab.llvm.org/buildbot/#/builders/193/builds/104). Switch more inner
structs to C++98 initializers to work around the issue. Add FIXMEs to revisit
in the future.
2021-10-11 22:36:56 -07:00
Lang Hames 3a52a639b1 [ORC] Add more explicit narrowing casts.
This should fix the buildbot failure at
https://lab.llvm.org/buildbot/#/builders/187/builds/2140
2021-10-11 22:00:06 -07:00
Lang Hames 9ca5064153 [ORC] Fix a typo in a variable name. 2021-10-11 21:50:46 -07:00
Lang Hames 962a2479b5 Re-apply e50aea58d5, "Major JITLinkMemoryManager refactor". with fixes.
Adds explicit narrowing casts to JITLinkMemoryManager.cpp.

Honors -slab-address option in llvm-jitlink.cpp, which was accidentally
dropped in the refactor.

This effectively reverts commit 6641d29b70.
2021-10-11 21:39:00 -07:00
Yonghong Song 1321e47298 BPF: rename BTF_KIND_TAG to BTF_KIND_DECL_TAG
Per discussion in https://reviews.llvm.org/D111199,
the existing btf_tag attribute will be renamed to
btf_decl_tag. This patch updated BTF backend to
use btf_decl_tag attribute name and also
renamed BTF_KIND_TAG to BTF_KIND_DECL_TAG.

Differential Revision: https://reviews.llvm.org/D111592
2021-10-11 21:33:39 -07:00
hsmahesha 52cb3af08c [AMDGPU] Remove dead frame indices after sgpr spill.
All those frame indices which are dead after sgpr spill should be removed from
the function frame. Othewise, there is a side effect such as re-mapping of free
frame index ids by the later pass(es) like "stack slot coloring" which in turn
could mess-up with the book keeping of "frame index to VGPR lane".

Reviewed By: cdevadas

Differential Revision: https://reviews.llvm.org/D111150
2021-10-12 09:58:49 +05:30
Yonghong Song 325d000765 [NFC][Attr] rename attribute btf_tag to btf_decl_tag
Per discussion in https://reviews.llvm.org/D111199,
the existing btf_tag attribute will be renamed to
btf_decl_tag. This patch mostly updated the Bitcode and
DebugInfo test cases with new attribute name.

Differential Revision: https://reviews.llvm.org/D111591
2021-10-11 20:57:31 -07:00
Freddy Ye d57a87ea89 [X86][ISel] Lowering llvm.thread.pointer
Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D110681
2021-10-12 11:01:18 +08:00
Lang Hames 6641d29b70 Revert "[JITLink][ORC] Major JITLinkMemoryManager refactor."
This reverts commit e50aea58d5 while I
investigate bot failures.
2021-10-11 19:23:41 -07:00
Lang Hames e50aea58d5 [JITLink][ORC] Major JITLinkMemoryManager refactor.
This commit substantially refactors the JITLinkMemoryManager API to: (1) add
asynchronous versions of key operations, (2) give memory manager implementations
full control over link graph address layout, (3) enable more efficient tracking
of allocated memory, and (4) support "allocation actions" and finalize-lifetime
memory.

Together these changes provide a more usable API, and enable more powerful and
efficient memory manager implementations.

To support these changes the JITLinkMemoryManager::Allocation inner class has
been split into two new classes: InFlightAllocation, and FinalizedAllocation.
The allocate method returns an InFlightAllocation that tracks memory (both
working and executor memory) prior to finalization. The finalize method returns
a FinalizedAllocation object, and the InFlightAllocation is discarded. Breaking
Allocation into InFlightAllocation and FinalizedAllocation allows
InFlightAllocation subclassses to be written more naturally, and FinalizedAlloc
to be implemented and used efficiently (see (3) below).

In addition to the memory manager changes this commit also introduces a new
MemProt type to represent memory protections (MemProt replaces use of
sys::Memory::ProtectionFlags in JITLink), and a new MemDeallocPolicy type that
can be used to indicate when a section should be deallocated (see (4) below).

Plugin/pass writers who were using sys::Memory::ProtectionFlags will have to
switch to MemProt -- this should be straightworward. Clients with out-of-tree
memory managers will need to update their implementations. Clients using
in-tree memory managers should mostly be able to ignore it.

Major features:

(1) More asynchrony:

The allocate and deallocate methods are now asynchronous by default, with
synchronous convenience wrappers supplied. The asynchronous versions allow
clients (including JITLink) to request and deallocate memory without blocking.

(2) Improved control over graph address layout:

Instead of a SegmentRequestMap, JITLinkMemoryManager::allocate now takes a
reference to the LinkGraph to be allocated. The memory manager is responsible
for calculating the memory requirements for the graph, and laying out the graph
(setting working and executor memory addresses) within the allocated memory.
This gives memory managers full control over JIT'd memory layout. For clients
that don't need or want this degree of control the new "BasicLayout" utility can
be used to get a segment-based view of the graph, similar to the one provided by
SegmentRequestMap. Once segment addresses are assigned the BasicLayout::apply
method can be used to automatically lay out the graph.

(3) Efficient tracking of allocated memory.

The FinalizedAlloc type is a wrapper for an ExecutorAddr and requires only
64-bits to store in the controller. The meaning of the address held by the
FinalizedAlloc is left up to the memory manager implementation, but the
FinalizedAlloc type enforces a requirement that deallocate be called on any
non-default values prior to destruction. The deallocate method takes a
vector<FinalizedAlloc>, allowing for bulk deallocation of many allocations in a
single call.

Memory manager implementations will typically store the address of some
allocation metadata in the executor in the FinalizedAlloc, as holding this
metadata in the executor is often cheaper and may allow for clean deallocation
even in failure cases where the connection with the controller is lost.

(4) Support for "allocation actions" and finalize-lifetime memory.

Allocation actions are pairs (finalize_act, deallocate_act) of JITTargetAddress
triples (fn, arg_buffer_addr, arg_buffer_size), that can be attached to a
finalize request. At finalization time, after memory protections have been
applied, each of the "finalize_act" elements will be called in order (skipping
any elements whose fn value is zero) as

((char*(*)(const char *, size_t))fn)((const char *)arg_buffer_addr,
                                     (size_t)arg_buffer_size);

At deallocation time the deallocate elements will be run in reverse order (again
skipping any elements where fn is zero).

The returned char * should be null to indicate success, or a non-null
heap-allocated string error message to indicate failure.

These actions allow finalization and deallocation to be extended to include
operations like registering and deregistering eh-frames, TLS sections,
initializer and deinitializers, and language metadata sections. Previously these
operations required separate callWrapper invocations. Compared to callWrapper
invocations, actions require no extra IPC/RPC, reducing costs and eliminating
a potential source of errors.

Finalize lifetime memory can be used to support finalize actions: Sections with
finalize lifetime should be destroyed by memory managers immediately after
finalization actions have been run. Finalize memory can be used to support
finalize actions (e.g. with extra-metadata, or synthesized finalize actions)
without incurring permanent memory overhead.
2021-10-11 19:12:42 -07:00
Amara Emerson 53ebfa7c5d [AArch64][GlobalISel] Fix combiner assertion in matchConstantOp().
We shouldn't call APInt::getSExtValue() on a >64b value.
2021-10-11 15:55:13 -07:00
Guozhi Wei 6599961c17 [TwoAddressInstructionPass] Improve the SrcRegMap and DstRegMap computation
This patch contains following enhancements to SrcRegMap and DstRegMap:

  1 In findOnlyInterestingUse not only check if the Reg is two address usage,
    but also check after commutation can it be two address usage.

  2 If a physical register is clobbered, remove SrcRegMap entries that are
    mapped to it.

  3 In processTiedPairs, when create a new COPY instruction, add a SrcRegMap
    entry only when the COPY instruction is coalescable. (The COPY src is
    killed)

With these enhancements isProfitableToCommute can do better commute decision,
and finally more register copies are removed.

Differential Revision: https://reviews.llvm.org/D108731
2021-10-11 15:28:31 -07:00
Alina Sbirlea f7ca54289c [LoopSimplifyCFG] Do not require MSSA. Continue to preserve if available.
LoopSimplifyCFG does not need MSSA, but should preserve it if it's available.

This is a legacy PM change, aimed to denoise the test changes in D109958.

Differential Revision: https://reviews.llvm.org/D111578
2021-10-11 14:27:15 -07:00
Lang Hames 17a0858f9d [ORC] Propagate errors to handlers when sendMessage fails.
In SimpleRemoteEPC, calls to from callWrapperAsync to sendMessage may fail.
The handlers may or may not be sent failure messages by handleDisconnect,
depending on when that method is run. This patch adds a check for an un-failed
handler, and if it finds one sends it a failure message.
2021-10-11 14:23:50 -07:00
Lang Hames 4fc2a4cc01 [ORC] Destroy FinalizeErr if there is a serialization error.
If there is a serialization error then FinalizeErr should never be set, so we
can use cantFail rather than consumeError here.
2021-10-11 14:23:50 -07:00
Nikita Popov 2a2a37d972 [IVUsers] Check for preheader instead of loop simplify form
IVUsers currently makes sure that all loops dominating a user are
in loop simplify form, because SCEVExpander needs a preheader to
insert into. However, loop simplify form requires much more than
that. In particular, it requires dedicated exits, which means that
exits need to be found and walked. For large functions with many
nested loops, this can result in pathological compile-time explosion.

Fix this by only checking the property we're actually interested in,
which is incidentally cheap to check.

Differential Revision: https://reviews.llvm.org/D111493
2021-10-11 23:13:13 +02:00
David Green 860b4479dc [ARM] Be more explicit about disabling CombineBaseUpdate for MVE.
This shouldn't be called for non-neon targets at the moment in either
case, but it is good to be expliit about the CombineBaseUpdate being a
NEON function, not expecting to be run under MVE.
2021-10-11 21:51:45 +01:00
Roman Lebedev 684cbae89a
[KnownBits] Introduce `countMaxActiveBits()` and use it in a few places 2021-10-11 23:36:06 +03:00
Arthur Eubanks 259390de9a [LCG] Don't skip invalidation of LazyCallGraph if CFG analyses are preserved
The CFG being changed and the overall call graph are not related, we can introduce/remove calls without changing the CFG.

Resolves one of the issues in PR51946.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D111275
2021-10-11 13:30:47 -07:00
Sanjay Patel 59441c7329 [InstCombine] fold signbit check of X | (X -1)
There may be some other patterns like this or a generalization,
but this is an example that I noticed would definitely regress
with a planned follow-up to D111410.

https://alive2.llvm.org/ce/z/GVpQDb
2021-10-11 16:14:13 -04:00
Arthur Eubanks fbddf22ef7 [SCCP] Properly report changes when changing a pointer argument
Fixes one of the issues in PR51946.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D111277
2021-10-11 13:12:08 -07:00
Jay Foad edfdce2627 [PHIElimination] Fix accounting for undef uses when updating LiveVariables
PHI elimination updates LiveVariables info as described here:

    // We only need to update the LiveVariables kill of SrcReg if this was the
    // last PHI use of SrcReg to be lowered on this CFG edge and it is not live
    // out of the predecessor. We can also ignore undef sources.

Unfortunately if the last use also happened to be an undef use then it
would fail to update the LiveVariables at all. Fix this by not counting
undef uses in the VRegPHIUse map.

Thanks to Mikael Holmén for the test case!

Differential Revision: https://reviews.llvm.org/D111552
2021-10-11 20:22:47 +01:00
Jay Foad 2e1ad93201 [AMDGPU] Fix copying a machine operand
Without this I get:

*** Bad machine code: Instruction has operand with wrong parent set ***
- function:    available_externally_test
- basic block: %bb.0  (0x7dad598)
- instruction: %0:r600_treg32_x = MOV 1, 0, 0, 0, $alu_literal_x, 0, 0, 0, -1, 1, $pred_sel_off, @available_externally, 0

Differential Revision: https://reviews.llvm.org/D111549
2021-10-11 20:22:47 +01:00
Florian Hahn ab33427c86
[VPlan] Print live-in backedge taken count as part of plan.
At the moment, a VPValue is created for the backedge-taken count, which
is used by some recipes. To make it easier to identify the operands of
recipes using the backedge-taken count, print it at the beginning of the
VPlan if it is used.

Reviewed By: a.elovikov

Differential Revision: https://reviews.llvm.org/D111298
2021-10-11 20:13:01 +01:00
Stefan Gränitz a6c9506365 [Orc] Handle hangup messages in SimpleRemoteEPC
On the controller-side, handle `Hangup` messages from the executor. The executor passed `Error::success()` or a failure message as payload.

Hangups cause an immediate disconnect of the transport layer. The disconnect function may be called later again and so implementations should be prepared. `FDSimpleRemoteEPCTransport::disconnect()` already has a flag to check that:
cd1bd95d87/llvm/lib/ExecutionEngine/Orc/Shared/SimpleRemoteEPCUtils.cpp (L112)

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D111527
2021-10-11 21:04:56 +02:00
Jonas Devlieghere 070315d04c Revert "Allow signposts to take advantage of deferred string substitution"
This reverts commits f9aba9a5af and
035217ff51.

As explained in the original commit message, this didn't have the
intended effect of improving the common LLDB use case, but still
provided a marginal improvement for the places where LLDB creates a
scoped time with a string literal.

The reason for the revert is that this change pulls in the os/signpost.h
header in Signposts.h. The former transitively includes loader.h, which
contains a series of macro defines that conflict with MachO.h. There are
ways to work around that, but Adrian and I concluded that  none of them
are worth the trade-off in complicating Signposts.h even further.
2021-10-11 11:09:36 -07:00
Joe Nash b4b7e605a6 [AMDGPU] Support shared literals in FMAMK/FMAAK
These instructions should allow src0 to be a literal with the same
value as the mandatory other literal. Enable it by introducing an
operand that defers adding its value to the MI when decoding till
the mandatory literal is parsed.

Reviewed By: dp, foad

Differential Revision: https://reviews.llvm.org/D111067

Change-Id: I22b0ae0d35bad17b6f976808e48bffe9a6af70b7
2021-10-11 13:09:54 -04:00
Philip Reames 7f55209cee [SCEV] Extend trip count to avoid overflow by default
As a brief reminder, an "exit count" is the number of times the backedge executes before some event. It can be zero if we exit before the backedge is reached. A "trip count" is the number of times the loop header is entered if we branch into the loop. In general, TC = BTC + 1 and thus a zero trip count is ill defined

There is a cornercases which we don't handle well. Let's assume i8 for our examples to keep things simple. If BTC = 255, then the correct trip count is 256. However, 256 is not representable in i8.

In theory, code which needs to reason about trip counts is responsible for checking for this cornercase, and either bailing out, or handling it correctly. Historically, we don't have a great track record about actually doing so.

When reviewing D109676, I found myself asking a basic question. Was there any good reason to preserve the current wrap-to-zero behavior when converting from backedge taken counts to trip counts? After reviewing existing code, I could not find a single case which appears to correctly and precisely handle the overflow case.

This patch changes the default behavior to extend instead of wrap. That is, if the result might be 256, we return a value of i9 type to ensure we interpret the count correctly. I did leave the legacy behavior as an option since a) loop-flatten stops triggering if I extend due to weirdly specific pattern matching I didn't understand and b) we could reasonably use the mode if we'd externally established a lack of overflow.

I want to emphasize that this change is *not* NFC. There are two call sites (one in ScalarEvolution.cpp, one in LoopCacheAnalysis.cpp) which are switched to the extend semantics. The former appears imprecise (but correct) for a constant 255 BTC. The later appears incorrect, though I don't have a test case.

Differential Revision: https://reviews.llvm.org/D110587
2021-10-11 09:55:55 -07:00
Victor Campos 3550e242fa [Clang][ARM][AArch64] Add support for Armv9-A, Armv9.1-A and Armv9.2-A
armv9-a, armv9.1-a and armv9.2-a can be targeted using the -march option
both in ARM and AArch64.

 - Armv9-A maps to Armv8.5-A.
 - Armv9.1-A maps to Armv8.6-A.
 - Armv9.2-A maps to Armv8.7-A.
 - The SVE2 extension is enabled by default on these architectures.
 - The cryptographic extensions are disabled by default on these
 architectures.

The Armv9-A architecture is described in the Arm® Architecture Reference
Manual Supplement Armv9, for Armv9-A architecture profile
(https://developer.arm.com/documentation/ddi0608/latest).

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D109517
2021-10-11 17:44:09 +01:00
Tommi Pisto 42b588a200 [ORC] Add static and dynamic library generator support to C API.
Adds LLVMOrcCreateStaticLibrarySearchGeneratorForPath and
LLVMOrcCreateDynamicLibrarySearchGeneratorForPath functions to create generators
for static and dynamic libraries.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D108535
2021-10-11 09:25:59 -07:00
hyeongyu kim 0aeb37324d [SimpleLoopUnswitch] Re-fix introduction of UB when hoisted condition may be undef or poison
https://bugs.llvm.org/show_bug.cgi?id=27506
https://bugs.llvm.org/show_bug.cgi?id=31652
https://bugs.llvm.org/show_bug.cgi?id=51043

Problems with SimpleLoopUnswitch cause the bug reports above.

```
while (...) {
  if (C) { A }
  else   { B }
}
Into:

C' = freeze(C)
if (C') {
  while (...) { A }
} else {
  while (...) { B }
}
```
This problem can be solved by adding a freeze on hoisted branches(above transform) and has been solved by D29015.
However, D29015 is now reverted by performance regression(2b5a897651)

It is not the first time that an added freeze has caused performance regression.
SimplifyCFG also had a problem with UB caused by branching-on-undef, which was solved by adding freeze to the branching condition. (D104569)
Performance regression occurred in D104569, and patches such as D105344 and D105392 were written to minimize it.

This patch will correct the SimpleLoopUnswitch as D104569 handles the SimplyCFG while minimizing performance loss by introducing patches like D105344 and D105392(This patch was rebased with the author's permission)

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D106041
2021-10-12 01:02:09 +09:00
Raphael Isemann bdc35b0efc [Object] Deduplicate the three createError functions
The Object library currently has three identical functions that translate a
Twine into a parser error. Until recently these functions have coexisted
peacefully, but since D110320 Clang with enabled modules is now diagnosing that
we have several definitions of `createError` in Object.

This patch just merges them all and puts them into Object's `Error.h` where the
error code for `parse_failed` is also defined which seems cleaner and unbreaks
the bots.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D111541
2021-10-11 16:36:57 +02:00
Simon Pilgrim ad16c6e52f [X86][AVX] Ensure we retain zero elements in select(pshufb,pshufb) -> or(pshufb,pshufb) fold (PR52122)
The select(pshufb,pshufb) -> or(pshufb,pshufb) fold uses getConstVector to create the refreshed pshufb masks, which treats all negative indices as undef.

PR52122 shows that if we were selecting an element that the PSHUFB has set to zero we must set it back to 0x80 when we recreate the PSHUFB mask and not just leave it as SM_SentinelZero
2021-10-11 14:50:24 +01:00
Bradley Smith 03065ecd85 [AArch64][SVE] Ensure LowerEXTRACT_SUBVECTOR is not called for illegal types
The lowering for EXTRACT_SUBVECTOR should not be called during type
legalization, only as part of lowering, hence return SDValue() when
called on illegal types.

This also adds missing tests for extracting fixed types from illegal
scalable types.

Differential Revision: https://reviews.llvm.org/D111412
2021-10-11 11:20:50 +00:00
Andrew Savonichev 7ae8f392a1 [AArch64] Emit AssertZExt for i1 arguments
AAPCS requires i1 argument to be zero-extended to 8-bits by the
caller. Emit a new AArch64ISD::ASSERT_ZEXT_BOOL hint (or AssertZExt
for GlobalISel) to enable some optimization opportunities. In
particular, when the argument is forwarded to the callee, we can avoid
zero-extension and use it as-is.

Differential Revision: https://reviews.llvm.org/D107160
2021-10-11 11:55:11 +03:00
David Sherwood 26b7d9d622 [LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns
This patch adds further support for vectorisation of loops that involve
selecting an integer value based on a previous comparison. Consider the
following C++ loop:

  int r = a;
  for (int i = 0; i < n; i++) {
    if (src[i] > 3) {
      r = b;
    }
    src[i] += 2;
  }

We should be able to vectorise this loop because all we are doing is
selecting between two states - 'a' and 'b' - both of which are loop
invariant. This just involves building a vector of values that contain
either 'a' or 'b', where the final reduced value will be 'b' if any lane
contains 'b'.

The IR generated by clang typically looks like this:

  %phi = phi i32 [ %a, %entry ], [ %phi.update, %for.body ]
  ...
  %pred = icmp ugt i32 %val, i32 3
  %phi.update = select i1 %pred, i32 %b, i32 %phi

We already detect min/max patterns, which also involve a select + cmp.
However, with the min/max patterns we are selecting loaded values (and
hence loop variant) in the loop. In addition we only support certain
cmp predicates. This patch adds a new pattern matching function
(isSelectCmpPattern) and new RecurKind enums - SelectICmp & SelectFCmp.
We only support selecting values that are integer and loop invariant,
however we can support any kind of compare - integer or float.

Tests have been added here:

  Transforms/LoopVectorize/AArch64/sve-select-cmp.ll
  Transforms/LoopVectorize/select-cmp-predicated.ll
  Transforms/LoopVectorize/select-cmp.ll

Differential Revision: https://reviews.llvm.org/D108136
2021-10-11 09:41:38 +01:00
Clement Courbet 83ded5d323 re-land "[AA] Teach BasicAA to recognize basic GEP range information."
Now that PR52104 is fixed.
2021-10-11 10:04:22 +02:00
Clement Courbet 6aaf1e7ea9 [LoopIdiom] Fix store size SCEV type.
We were using the type of the loop back edge count to represent the
store size. This failed for small loop counts (e.g. in the added test,
the loop count was an i2).

Use the index type instead.

Fixes PR52104.

Differential Revision: https://reviews.llvm.org/D111401
2021-10-11 09:39:06 +02:00
Lang Hames 4d7cea3d2e [ORC] Add optional RunPolicy to ExecutorProcessControl::callWrapperAsync.
The callWrapperAsync and callSPSWrapperAsync methods take a handler object
that is run on the return value of the call when it is ready. The new RunPolicy
parameters allow clients to control how these handlers are run. If no policy is
specified then the handler will be packaged as a GenericNamedTask and dispatched
using the ExecutorProcessControl's TaskDispatch member. Callers can use the
ExecutorProcessControl::RunInPlace policy to cause the handler to be run
directly instead, which may be preferrable for simple handlers, or they can
write their own policy object (e.g. to dispatch as some other kind of Task,
rather than GenericNamedTask).
2021-10-10 20:41:59 -07:00
Esme-Yi a00ff71668 [XCOFF] Improve error message context.
Summary: This patch improves the error message context of the
XCOFF interfaces by providing more details.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D110320
2021-10-11 02:52:20 +00:00
Lang Hames 771e69484a [ORC] Add dependence on pthreads library to ORC.
f341161689 introduced a dependence (for builds with LLVM_ENABLE_THREADS) on
pthreads. This commit updates the CMakeLists.txt file to include a LINK_LIBS
entry for pthreads.
2021-10-10 19:34:34 -07:00
Lang Hames f341161689 [ORC] Add TaskDispatch API and thread it through ExecutorProcessControl.
ExecutorProcessControl objects will now have a TaskDispatcher member which
should be used to dispatch work (in particular, handling incoming packets in
the implementation of remote EPC implementations like SimpleRemoteEPC).

The GenericNamedTask template can be used to wrap function objects that are
callable as 'void()' (along with an optional name to describe the task).
The makeGenericNamedTask functions can be used to create GenericNamedTask
instances without having to name the function object type.

In a future patch ExecutionSession will be updated to use the
ExecutorProcessControl's dispatcher, instead of its DispatchTaskFunction.
2021-10-10 18:39:55 -07:00
Amara Emerson f1e9ecea44 [AArch64][GlobalISel] Legalize G_VECREDUCE_XOR. Treated same as other bitwise reductions. 2021-10-10 17:01:21 -07:00
Lang Hames da7f993a8d [ORC] Reorder callWrapperAsync and callSPSWrapperAsync parameters.
The callee address is now the first parameter and the 'SendResult' function
the second. This change improves consistentency with the non-async functions
where the callee is the first address and the return value the second.
2021-10-10 13:10:43 -07:00
Dawid Jurczak 9e65929a8e [DSE] Re-enable calloc transformation with extra care (PR25892)
Transformation from malloc+memset to calloc is always correct and in many situations
it brings significant observable benefits in terms of execution speed and memory consumption [1][2].
Unfortunately there are cases when producing calloc cause performance drops [3].
As discussed here: https://reviews.llvm.org/D103009 it's possible to differentiate between those 2 scenarios.
If optimizer is able to prove that after malloc call it's _very_ likely to reach memset branch then after
calloc emission we shouldn't observe any performance hits. Therefore finding "null pointer check" pattern
before memset basic block sounds like good justification for performing transformation.
Also that method was already suggested by GCC folks [4]. Main reason for change is that for now
to be safe we check for post dominance relation which is way too conservative approach making transformation
"almost" disabled in practice. This patch tends to enable transformation again but with extra care.

[1] https://stackoverflow.com/questions/2688466/why-mallocmemset-is-slower-than-calloc
[2] https://vorpus.org/blog/why-does-calloc-exist/
[3] http://smalldatum.blogspot.com/2017/11/a-new-optimization-in-gcc-5x-and-mysql.html
[4] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83022

Differential Revision: https://reviews.llvm.org/D110021
2021-10-10 21:47:14 +02:00
Sanjay Patel 05281d95f2 [InstCombine] move fold for "(X-Y) == 0"; NFC
This consolidates related folds that all have a
similar use restriction that may not be necessary.
2021-10-10 11:26:03 -04:00
Sanjay Patel da210f5d34 [InstCombine] canonicalize "(C2 - Y) > C" as (Y + ~C2) < ~C
The test diffs show that we have better analysis/folds for 'add'
(although we should at least have the simplifications
independently, so we don't have the one-use restriction).

This is related to solving regressions that would appear in
transforms related to D111410, and that is part of a series
of enhancements that may eventually helpi solve PR34047.

https://alive2.llvm.org/ce/z/3tB9KG

  define i1 @src(i8 %x, i8 %C, i8 %C2) {
    %sub = sub nuw i8 %C2, %x
    %r = icmp slt i8 %sub, %C
    ret i1 %r
  }

  define i1 @tgt(i8 %x, i8 %C, i8 %C2) {
    %Cnot = xor i8 %C, -1
    %C2not = xor i8 %C2, -1
    %add = add nuw i8 %x, %C2not
    %r = icmp sgt i8 %add, %Cnot
    ret i1 %r
  }
2021-10-10 11:06:49 -04:00
william woodruff e7fc254875 [BitcodeAnalyzer] allow a motivated user to dump BLOCKINFO
This adds the `--dump-blockinfo` flag to `llvm-bcanalyzer`, allowing a sufficiently motivated user to dump (parts of) the `BLOCKINFO_BLOCK` block. The default behavior is unchanged, and `--dump-blockinfo` only takes effect in the same context as other flags that control dump behavior (i.e., requires that `--dump` is also passed).

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D107536
2021-10-10 10:15:14 +05:30
Amara Emerson f95d9c95bb [GlobalISel] Fix the stores of truncates -> wide store combine for non-evenly dividing type sizes.
If the wide store we'd generate is not a multiple of the memory type of the
narrow stores (e.g. s48 and s32), we'd assert. Fix that.
2021-10-09 21:18:20 -07:00
Sanjay Patel acafde09a3 [InstCombine] enhance icmp with sub folds
There were 2 related but over-specified folds for:
C1 - X == C

One allowed multi-use but was limited to equal constants.
The other allowed different constants but disallowed multi-use.

This combines the 2 folds into a more general match.
The test diffs show the multi-use cases that were falling
through the cracks.

https://alive2.llvm.org/ce/z/4_hEt2

  define i1 @src(i8 %x, i8 %subC, i8 %C) {
    %s = sub i8 %subC, %x
    %r = icmp eq i8 %s, %C
    ret i1 %r
  }

  define i1 @tgt(i8 %x, i8 %subC, i8 %C) {
    %newC = sub i8 %subC, %C
    %isneg = icmp eq i8 %x, %newC
    ret i1 %isneg
  }
2021-10-09 11:39:49 -04:00
Dávid Bolvanský 943b304848 Fixed some errors detected by PVS Studio 2021-10-09 17:27:41 +02:00
Dávid Bolvanský 3649fb14d1 Fixed some errors detected by PVS Studio 2021-10-09 17:20:04 +02:00
Nikita Popov ea12adc169 [CanonicalizeFreeze] Drop IVUsers.h include (NFC)
Looking for users of IVUsers, this was a false positive. Only LSR
uses IVUsers.
2021-10-09 17:01:26 +02:00
David Green adec922361 [AArch64] Make -mcpu=generic schedule for an in-order core
We would like to start pushing -mcpu=generic towards enabling the set of
features that improves performance for some CPUs, without hurting any
others. A blend of the performance options hopefully beneficial to all
CPUs. The largest part of that is enabling in-order scheduling using the
Cortex-A55 schedule model. This is similar to the Arm backend change
from eecb353d0e which made -mcpu=generic perform in-order scheduling
using the cortex-a8 schedule model.

The idea is that in-order cpu's require the most help in instruction
scheduling, whereas out-of-order cpus can for the most part out-of-order
schedule around different codegen. Our benchmarking suggests that
hypothesis holds. When running on an in-order core this improved
performance by 3.8% geomean on a set of DSP workloads, 2% geomean on
some other embedded benchmark and between 1% and 1.8% on a set of
singlecore and multicore workloads, all running on a Cortex-A55 cluster.

On an out-of-order cpu the results are a lot more noisy but show flat
performance or an improvement. On the set of DSP and embedded
benchmarks, run on a Cortex-A78 there was a very noisy 1% speed
improvement. Using the most detailed results I could find, SPEC2006 runs
on a Neoverse N1 show a small increase in instruction count (+0.127%),
but a decrease in cycle counts (-0.155%, on average). The instruction
count is very low noise, the cycle count is more noisy with a 0.15%
decrease not being significant. SPEC2k17 shows a small decrease (-0.2%)
in instruction count leading to a -0.296% decrease in cycle count. These
results are within noise margins but tend to show a small improvement in
general.

When specifying an Apple target, clang will set "-target-cpu apple-a7"
on the command line, so should not be affected by this change when
running from clang. This also doesn't enable more runtime unrolling like
-mcpu=cortex-a55 does, only changing the schedule used.

A lot of existing tests have updated. This is a summary of the important
differences:
 - Most changes are the same instructions in a different order.
 - Sometimes this leads to very minor inefficiencies, such as requiring
   an extra mov to move variables into r0/v0 for the return value of a test
   function.
 - misched-fusion.ll was no longer fusing the pairs of instructions it
   should, as per D110561. I've changed the schedule used in the test
   for now.
 - neon-mla-mls.ll now uses "mul; sub" as opposed to "neg; mla" due to
   the different latencies. This seems fine to me.
 - Some SVE tests do not always remove movprfx where they did before due
   to different register allocation giving different destructive forms.
 - The tests argument-blocks-array-of-struct.ll and arm64-windows-calls.ll
   produce two LDR where they previously produced an LDP due to
   store-pair-suppress kicking in.
 - arm64-ldp.ll and arm64-neon-copy.ll are missing pre/postinc on LPD.
 - Some tests such as arm64-neon-mul-div.ll and
   ragreedy-local-interval-cost.ll have more, less or just different
   spilling.
 - In aarch64_generated_funcs.ll.generated.expected one part of the
   function is no longer outlined. Interestingly if I switch this to use
   any other scheduled even less is outlined.

Some of these are expected to happen, such as differences in outlining
or register spilling. There will be places where these result in worse
codegen, places where they are better, with the SPEC instruction counts
suggesting it is not a decrease overall, on average.

Differential Revision: https://reviews.llvm.org/D110830
2021-10-09 15:58:31 +01:00
Nikita Popov a94002cd64 [Type] Avoid APFloat.h include (NFC)
This is only used by a handful of methods working on fltSemantics,
and having these defined inline in the header does not look
particularly important.
2021-10-09 11:29:26 +02:00
Nikita Popov 55b9146848 [MCPseudoProbe] Clean up includes (NFC)
This was including various things that don't appear to be used in
the header at all.
2021-10-09 10:31:15 +02:00
luxufan 02ac5e5cf1 [Orc] Fix global variable destructor function support when --jit-kind=orc-lazy
The bug was reported here https://bugs.llvm.org/show_bug.cgi?id=52030

This patch follows the idea that @lhames commented in the above webpage.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D110990
2021-10-09 15:58:21 +08:00
Max Kazantsev 4c0da23663 [LoopDeletion] Support selects when symbolically evaluating 1st iteration
Adds support for selects for which we know value on the 1st iteration.

Differential Revision: https://reviews.llvm.org/D104111
Reviewed By: nikic
2021-10-09 14:47:44 +07:00
luxufan 590326382d [Orc] Support atexit in Orc(JITLink)
There is a bug reported at https://bugs.llvm.org/show_bug.cgi?id=48938

After looking through the glibc, I found the `atexit(f)` is the same as `__cxa_atexit(f, NULL, NULL)`. In orc runtime, we identify different JITDylib by their dso_handle value, so that a NULL dso_handle is invalid. So in this patch, I added a `PlatformJDDSOHandle` to ELFNixRuntimeState, and functions which are registered by atexit will be registered at PlatformJD.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D111413
2021-10-09 12:25:47 +08:00
william woodruff 778bf73d7b [BitcodeReader] fix a logic error in vector type element validation
The current code checks whether the vector's element type is a valid structure element type, rather than a valid vector element type. The two have separate implementations and but only accept very slightly different sets of types, which is probably why this wasn't caught before.

Differential Revision: https://reviews.llvm.org/D109655
2021-10-09 09:42:02 +05:30
Brad Smith 65df10f3cd [OpenBSD] Use cortex-a8 as default CPU for ARMv7 2021-10-08 23:57:40 -04:00
Qiu Chaofan f45d5e71d3 [APFloat] Set size of PPCDoubleDouble to 128
566690b0 uses size information in float semantics, but PPCDoubleDouble
left them empty.

As follow-up, we can consider remove PPCDoubleDoubleLegacy and fill
other fields in the future.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D111398
2021-10-09 10:12:10 +08:00
Arthur Eubanks 20a0c482e0 [LICM] Use Align instead of int 2021-10-08 18:26:15 -07:00
Arthur Eubanks a0a4935182 Make more places that use alignment use uint64_t
Followup to D110451.
2021-10-08 16:35:19 -07:00
Nick Desaulniers 9697f93587 [InlineCost] model calls to llvm.is.constant* more carefully
llvm.is.constant* intrinsics are evaluated to 0 or 1 integral values.

A common use case for llvm.is.constant comes from the higher level
__builtin_constant_p. A common usage pattern of __builtin_constant_p in
the Linux kernel is:

    void foo (int bar) {
      if (__builtin_constant_p(bar)) {
        // lots of code that will fold away to a constant.
      } else {
        // a little bit of code, usually a libcall.
      }
    }

A minor issue in InlineCost calculations is when `bar` is _not_ Constant
and still will not be after inlining, we don't discount the true branch
and the inline cost of `foo` ends up being the cost of both branches
together, rather than just the false branch.

This leads to code like the above where inlining will not help prove bar
Constant, but it still would be beneficial to inline foo, because the
"true" branch is irrelevant from a cost perspective.

For example, IPSCCP can sink a passed constant argument to foo:

    const int x = 42;
    void bar (void) { foo(x); }

This improves our inlining decisions, and fixes a few head scratching
cases were the disassembly shows a relatively small `foo` not inlined
into a lone caller.

We could further improve this modeling by tracking whether the argument
to llvm.is.constant* is a parameter of the function, and if inlining
would allow that parameter to become Constant. This idea is noted in a
FIXME comment.

Link: https://github.com/ClangBuiltLinux/linux/issues/1302

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D111272
2021-10-08 15:27:30 -07:00
Reid Kleckner b3a6d096d7 Fix shlib builds for all lib/Target/*/TargetInfo libs
They all must depend on MC now that the target registry is in MC.
Also fix llvm-cxxdump
2021-10-08 15:21:13 -07:00
Reid Kleckner 2827b1b89d Fix shared library build after TargetRegistry move 2021-10-08 15:06:03 -07:00
Reid Kleckner 89b57061f7 Move TargetRegistry.(h|cpp) from Support to MC
This moves the registry higher in the LLVM library dependency stack.
Every client of the target registry needs to link against MC anyway to
actually use the target, so we might as well move this out of Support.

This allows us to ensure that Support doesn't have includes from MC/*.

Differential Revision: https://reviews.llvm.org/D111454
2021-10-08 14:51:48 -07:00
Leonard Chan 4dc462b589 [AArch64] Emit CFI instruction for updating x18 when using ShadowCallStack with exception unwinding
PR45875 notes an instance where exception handling crashes on aarch64-fuchsia
where SCS is enabled by default. The underlying issue seems to be that within libunwind,
various _Unwind_* functions, the x18 register is not updated if a function is marked
with nounwind. This removes the check for nounwind and emits the CFI instruction that updates x18.

Differential Revision: https://reviews.llvm.org/D79822
2021-10-08 14:20:26 -07:00
Nikita Popov e3129fb792 [LoopFlatten] Mark inner loop as deleted
If a loop is flattened, the inner loop is removed and the LPM
should be informed of this fact, so it can invalidate associated
analyses. To support this, we relax an assertion in LPMUpdater to
allow invalidating non-top-level loops when running in LoopNestMode,
as the pass does not know how exactly it will get scheduled.

Differential Revision: https://reviews.llvm.org/D111350
2021-10-08 23:12:15 +02:00
Lang Hames 8fe3d9df0e Revert "[ORC] Move SimpleRemoteEPCServer::Dispatcher into OrcShared."
This reverts commit dfd74db981.

SimpleRemoteEPC should share dispatch with the ExecutionSession, rather than
having two different dispatch systems on the controller side.
SimpleRemoteEPCServer::Dispatch doesn't need to be shared.
2021-10-08 13:43:42 -07:00
Lang Hames a129305b28 [ORC] Remove a stale comment.
SimpleRemoteEPCServer Service shutdown (c965fde7c2) takes care of this.
2021-10-08 13:43:42 -07:00
Andrew Browne 007d98f520 [DFSan] Fix warning: getArgsFunctionType defined but not used
Warning introduced in 61ec2148c5
2021-10-08 11:58:36 -07:00
Arthur Eubanks a3358fcff1 More followup type changes after 05392466 2021-10-08 11:51:36 -07:00
Lang Hames dfd74db981 [ORC] Move SimpleRemoteEPCServer::Dispatcher into OrcShared.
Renames SimpleRemoteEPCServer::Dispatcher to SimpleRemoteEPCDispatcher and
moves it into OrcShared. SimpleRemoteEPCServer::ThreadDispatcher is similarly
moved and renamed to DynamicThreadPoolSimpleRemoteEPCDispatcher.

This will allow these classes to be reused by SimpleRemoteEPC on the controller
side of the connection.
2021-10-08 11:29:57 -07:00
Amara Emerson 17b89f9daa [GlobalISel] Improve G_UMHULH -> LSHR combine to accept non-uniform constant vectors. 2021-10-08 11:25:26 -07:00
Andrew Browne 61ec2148c5 [DFSan] Remove -dfsan-args-abi support in favor of TLS.
ArgsABI was originally added in https://reviews.llvm.org/D965

Current benchmarking does not show a significant difference.
There is no need to maintain both ABIs.

Reviewed By: pcc

Differential Revision: https://reviews.llvm.org/D111097
2021-10-08 11:18:36 -07:00
Craig Topper a9700653ab [RegisterScavenging] Use a Twine in a call to report_fatal_error instead of going from std::string to c_str. NFC
The std::string was built on the line above. Might as well just
build it as a Twine in the call.
2021-10-08 11:04:08 -07:00
Philip Reames edf31b4db1 [IPT] Add a statistic to track instructions scanned to answer queries
I'm planning some changes to the invalidation mechanism here, and having a concrete mechanism to track progress is key.
2021-10-08 10:59:35 -07:00
Philip Reames de5477ed42 Add a statistic to track number of times we rebuild instruction ordering
The goal here is to assist some future tuning work both on instruction ordering invalidation, and on some client code which uses it.
2021-10-08 10:59:34 -07:00
Arthur Eubanks 9405217999 Revert "Recommit "[LoopPeel] Peel loops with deoptimizing exits""
This reverts commit d68b59f3eb.

This is causing crashes, see D110922 for details.
2021-10-08 10:53:23 -07:00
Philip Reames b4498e6b8d [IPT] Narrow scope of removeInstruction invalidation [NFC]
We only need to invalidate if the instruction being removed is the cached "first special instruction".  If the instruction is before that one, it can't (by assumption) be special.  If it is after that one, it wasn't the first.
2021-10-08 10:35:03 -07:00
Andreas Schwab a706a5ef22 [Support] Define sys::getHostCPUName for RISC-V
The RISCV target doesn't define a "generic" cpu, only "generic-rv32" and
"generic-rv64".  Define sys::getHostCPUName for RISC-V that returns the
correct cpu for the host.

Reviewed By: craig.topper, MaskRay

Differential Revision: https://reviews.llvm.org/D105274
2021-10-08 10:08:39 -07:00
Philip Reames d694dd0f0d Add iterator range variants of isGuaranteedToTransferExecutionToSuccessor [mostly-nfc]
This factors out utilities for scanning a bounded block of instructions since we have this code repeated in a bunch of places.  The change to InlineFunction isn't strictly NFC as the limit mechanism there didn't handle debug instructions correctly.
2021-10-08 09:50:10 -07:00
Bradley Smith 7c68d4b8ff Revert "[SelectionDAG] Remove PromoteIntOp_EXTRACT_SUBVECTOR."
This reverts commit 3e8d2008f7.

The code removed in this commit is actually required for extracting
fixed types from illegal scalable types, hence this commit causes
assertion failures in such extracts.
2021-10-08 14:53:26 +00:00
David Stuttard 69f7d81d0a [AMDGPU] Set number vgprs used in PS shaders based on input registers actually used
For PS shaders we can use the input SPI_PS_INPUT_ENA and SPI_PS_INPUT_ADDR
registers

Calculate the number of VGPR registers used as input VGPRs based on these
registers rather than the arguments passed in (this conservatively always
allocates the maximum).

Differential Revision: https://reviews.llvm.org/D101633

Change-Id: Idf7c060cbbd5f7e3300102c55ecee3c07f209de6
2021-10-08 14:24:35 +01:00