Commit Graph

149978 Commits

Author SHA1 Message Date
Amara Emerson 95ac3d15e9 [AArch64][GlobalISel] Add G_VECREDUCE fewerElements support for full scalarization.
For some reductions like G_VECREDUCE_OR on AArch64, we need to scalarize
completely if the source is <= 64b. This change adds support for that in
the legalizer. If the source has a pow-2 num elements, then we can do
a tree reduction using the scalar operation in the individual elements.
Otherwise, we just create a sequential chain of operations.

For AArch64, we only need to scalarize if the input is <64b. If it's great than
64b then we can first do a fewElements step to 64b, taking advantage of vector
instructions until we reach the point of scalarization.

I also had to relax the verifier checks for reductions because the intrinsics
support <1 x EltTy> types, which we lower to scalars for GlobalISel.

Differential Revision: https://reviews.llvm.org/D108276
2021-08-19 16:38:52 -07:00
Fangrui Song fbb8e772ec [InstrProfiling] Make COFF use the ELF comdat scheme (drop link.exe compatibility)
The COFF specific `DataReferencedByCode` complexity (D103372 D103717) is due to
a link.exe limitation: an external symbol in IMAGE_COMDAT_SELECT_ASSOCIATIVE is
not really dropped, so it can cause duplicate definition error.
2021-08-19 16:38:32 -07:00
Amara Emerson a0051f7149 [AArch64][GlobalISel] Fix miscompile of <16 x s8> G_EXTRACT_VECTOR_ELT.
When support for copying vector s8 lanes was added recently, this also
had the side effect of fixing a fallback for <16 x s8> extracts since
both used the same helper. However, there was a bug in another helper
to get the regclass for a specific FPR-native type, which was assigning
FPR16 to s8 instead of FPR8.
2021-08-19 16:22:32 -07:00
Thomas Lively be6c49e743 [WebAssembly] Add explicit casts to silence -Wc++11-narrowing 2021-08-19 16:00:07 -07:00
Yonghong Song c1169b8bd3 Revert "[DebugInfo] generate btf_tag annotations for DIComposite types"
This reverts commit 2fded193e7.

Builtbot reports some test failures. Revert now so I can take time
to fix the issues.
2021-08-19 15:54:38 -07:00
Yonghong Song 2fded193e7 [DebugInfo] generate btf_tag annotations for DIComposite types
Clang patch D106614 added attribute btf_tag support. This patch
generates btf_tag annotations for DIComposite types.
A field "annotations" is introduced to DIComposite, and the
annotations are represented as an DINodeArray, similar to
DIComposite elements. The following example illustrates
how annotations are encoded in IR:
  distinct !DICompositeType(..., annotations: !10)
  !10 = !{!11, !12}
  !11 = !{!"btf_tag", !"a"}
  !12 = !{!"btf_tag", !"b"}
Each btf_tag annotation is represented as a 2D array of
meta strings. Each record may have more than one
btf_tag annotations, as in the above example.

Differential Revision: https://reviews.llvm.org/D106615
2021-08-19 15:37:44 -07:00
Thomas Lively fd0557dbf1 [WebAssembly] More convert_low and promote_low codegen
The convert_low and promote_low instructions can widen the lower two lanes of a
four-lane vector, but we were previously scalarizing patterns that widened lanes
besides the low two lanes. The commit adds a shuffle to move the widened lanes
into the low lane positions so the convert_low and promote_low instructions can
be used instead of scalarizing.

Depends on D108266.

Differential Revision: https://reviews.llvm.org/D108341
2021-08-19 15:37:12 -07:00
Thomas Lively b311a040ef [WebAssembly] Pattern match SIMD convert_low and promote_low during ISel
Since the simplest DAG patterns for convert_low and promote_low instructions
involved v2i32, v2f32, v4i64, and v4f64 types, which are not legal in the
WebAssembly backend and would be eliminated by type legalization, we were
previously matching those patterns in a DAG combine before the type legalization
stage. However in cases where the vectors were wider than 128 bits, the patterns
we matched were not created until the type legalization stage when the wide
vectors were split up. Type legalization would continue to eliminate the illegal
types we were matching as well, so the code ended up scalarized.

To make the ISel for these instructions more robust, match the scalarized
patterns rather than the patterns containing illegal types. Add tests with
double-wide vectors to show that this works as intended.

Fixes PR51098.
Depends on D107502.

Differential Revision: https://reviews.llvm.org/D108266
2021-08-19 15:24:28 -07:00
Akira Hatanaka 898dc4590c Refactor inlineRetainOrClaimRVCalls. NFC
This is in preparation for committing https://reviews.llvm.org/D103000.
2021-08-19 14:55:45 -07:00
Arthur Eubanks 7c8206cd2a [NFC] Cleanup AttributeList::getStackAlignment()
So that we don't use a confusing index.
2021-08-19 14:21:40 -07:00
Alexandre Rames cd28003336 [Support] Update `MD5` to follow other hashes.
Introduce `StringRef final()` and `StringRef result()`.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D107781
2021-08-19 14:13:14 -07:00
Arthur Eubanks 44a3241f10 [NFC] Replace some attribute methods that use confusing indexes 2021-08-19 14:10:26 -07:00
Alexandre Rames 10a126325d [NFC][Support] Move `MD5` members in `InternalState`.
This prepares an update to follow other hashes.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D108388
2021-08-19 14:04:14 -07:00
Florian Mayer 73323c6eaa [hwasan] re-enable stack safety by default.
The failed assertion was fixed in D108337.

Reviewed By: vitalybuka, eugenis

Differential Revision: https://reviews.llvm.org/D108381
2021-08-19 21:11:24 +01:00
Adrian Prantl 1e586bcc3e Move function definition out-of-line to fix the modularized build (NFC) 2021-08-19 12:26:23 -07:00
Thomas Lively b69374ca58 [WebAssembly] Legalize vector types by widening
The default legalization of unsupported vector types is to promote the integers
in each lane, which leads to extra sign or zero extending and masking when
moving data into and out of vectors. Switch our preferred type legalization from
the default to vector widening, which keeps the data in the low lanes of the
vector rather than in the low bits of each lane. The unused high lanes can be
ignored.

Half-wide vectors are now loaded from memory into the low 64 bits of the v128
rather than spread out among the lanes. As a result, v128.load64_splat is a much
more common operation, so add new patterns to support it.

Differential Revision: https://reviews.llvm.org/D107502
2021-08-19 12:07:33 -07:00
Philip Reames 17b9cb1817 [runtimeunroll] Support multiple exits to latch exit w/prolog loop
This patch extends the runtime unrolling infrastructure to support unrolling a loop with multiple exiting blocks branching to the same exit block used by the latch. It intentionally does not include a cost model change to enable this functionality unless appropriate force flags are used.

This is the prolog companion to D107381. Since this was LGTMed, a problem with DT updating was reported against that patch.  I roled in the analogous fix here as it seemed obvious, and not worth re-review.

As an aside, our prolog form leaves a lot of potential value on the floor when there is an invariant load or invariant condition in the loop being runtime unrolled. We should probably consider a "required prolog" heuristic.  (Alternatively, maybe we should be peeling these cases more aggressively?)

Differential Revision: https://reviews.llvm.org/D108262
2021-08-19 11:43:52 -07:00
Stanislav Mekhanoshin 8d7d89b081 [AMDGPU] Add alias.scope metadata to lowered LDS struct
Alias analysis is unable to disambiguate accesses to the structure
fields without it unlike distinct variables. As a result we cannot
combine ds_read and ds_write operations in a case of any store in
between which always considered clobbering.

Differential Revision: https://reviews.llvm.org/D108315
2021-08-19 11:40:30 -07:00
Nikita Popov 8cf5b69f69 [GuardWidening] Preserve MemorySSA
As reported on https://bugs.llvm.org/show_bug.cgi?id=51020, the
guard widening pass doesn't preserve MemorySSA, so it can no
longer be scheduled in the same loop pass manager as LICM. However,
the loop-schedule.ll test indicates that this is supposed to work.

Fix this by preserving MemorySSA if available, as this seems to be
trivial in this case (we only need to drop the memory access for
the removed guards).

Differential Revision: https://reviews.llvm.org/D108386
2021-08-19 20:23:17 +02:00
Philip Reames 447256f22b [runtimeunroll] Fix reported DT verification error after 94d0914
In 94d0914, I added support for unrolling of multiple exit loops which have multiple exits reaching the latch.  Per reports on the review post commit, I'd missed updating the domtree for one case.  This fix addresses that ommission.

There's no new test as this is covered by existing tests with expensive verification turned on.
2021-08-19 11:06:17 -07:00
Tim Northover edab411ee6 AArch64: copy all parts of the mem operand across when combining a store
In particular we were dropping volatility, which can lead to unwanted
transformations.
2021-08-19 18:26:39 +01:00
Chang-Sun Lin, Jr 9cae598f8b
[InstCombine] Avoid folding GEPs across loop boundaries
Folding a GEP from outside to inside a loop will materialize an add where there wasn't an equivalent operation before. Check the containing loops before making this fold.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D107935
2021-08-19 20:03:44 +03:00
Arthur Eubanks 33d44b762e [OpaquePtr][Inline] Use byval type instead of pointee type
Reviewed By: #opaque-pointers, dblaikie

Differential Revision: https://reviews.llvm.org/D105711
2021-08-19 09:56:08 -07:00
Owen Anderson 06a4c85890 Use v16i8 rather than v2i64 as the VT for memset expansion on AArch64.
This allows the instruction selector to realize that it can directly
broadcast the low byte of the memset value, rather than replicating
it to a 64-bit GPR before broadcasting.

This fixes PR50985.

Differential Revision: https://reviews.llvm.org/D108354
2021-08-19 16:54:07 +00:00
Augie Fackler e59c88294b MemoryBuiltins: trailing , on collection literal
This was probably bugging more than is reasonable, but it makes merging
changes in this file slightly less annoying to have the trailing comma
here. I only noticed this because Rust is currently carrying a patch to
this file and it kept making life a little difficult.
2021-08-19 17:59:23 +02:00
Craig Topper 84cea602f9 Revert "[SelectionDAGBuilder] Compute and cache PreferredExtendType on demand."
This reverts commit add08c8741.

There was a compile time jump on tramp3d-v4 on https://llvm-compile-time-tracker.com/
Want to see if it goes away with this reverted.
2021-08-19 08:42:05 -07:00
David Green d10f23a25d [ISel] Expand saddsat and ssubsat via asr and xor
This changes the lowering of saddsat and ssubsat so that instead of
using:
  r,o = saddo x, y
  c = setcc r < 0
  s = c ? INTMAX : INTMIN
  ret o ? s : r
into using asr and xor to materialize the INTMAX/INTMIN constants:
  r,o = saddo x, y
  s = ashr r, BW-1
  x = xor s, INTMIN
  ret o ? x : r
https://alive2.llvm.org/ce/z/TYufgD

This seems to reduce the instruction count in most testcases across most
architectures. X86 has some custom lowering added to compensate for
cases where it can increase instruction count.

Differential Revision: https://reviews.llvm.org/D105853
2021-08-19 16:08:07 +01:00
Simon Pilgrim 9419729b6a [CostModel][X86] Add VPOPCNTDQ/BITALG ctpop costs
VPOPCNTDQ + BITALG add ctpop instructions for vXi64/vXi32 + vXi16/vXi8 vector types respectively
2021-08-19 15:40:09 +01:00
Craig Topper add08c8741 [SelectionDAGBuilder] Compute and cache PreferredExtendType on demand.
Previously we pre-calculated this and cached it for every
instruction in the function. Most of the calculated results will
never be used. So instead calculate it only on the first use, and
then cache it.

The cache was originally added to fix a compile time issue which
caused r216066 to be reverted.

This change exposed that we weren't pre-computing the Value for
Arguments. I've explicitly disabled that for now as it seemed to
regress some tests on AArch64 which has sext built into its compare
instructions.

Spotted while investigating how to improve heuristics to work better
with RISCV preferring sign extend for unsigned compares for i32 on RV64.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D107976
2021-08-19 07:18:33 -07:00
Craig Topper c60a4c1ba5 [TypePromotion] Use Instruction* instead of Value* for a couple functions. NFC
This matches how they are called and allows some isa/cast/dyn_cast
to be removed.

Differential Revision: https://reviews.llvm.org/D108333
2021-08-19 07:09:38 -07:00
Craig Topper 36d8316cc8 [RISCV] Reduce duplicate code for calling SimplifyDemandedBits.
This encapsulates the APInt creation and worklist management into
a helper function.

To keep one common interface I've use Log2_32 in places that
previously created a mask by subtracting 1 from a power of 2.

Differential Revision: https://reviews.llvm.org/D108324
2021-08-19 07:09:38 -07:00
Alexey Lapshin ab9d506be3 [DWARF][Verifier][NFC] Use reference to DWARFAddressRangesVector to avoid copying.
Avoid copying while access to RangesOrError.get().
2021-08-19 16:23:05 +03:00
Sanjay Patel ec54e275f5 Revert "[CVP] processSwitch: Remove default case when switch cover all possible values."
This reverts commit 9934a5b2ed.
This patch may cause miscompiles because it missed a constraint
as shown in the examples from:
https://llvm.org/PR51531
2021-08-19 08:43:51 -04:00
Sanjay Patel eee0ded337 [InstCombine] add min/max intrinsics as freely invertible candidates
In the optimized test, we are able to peak through the
min/max that has 2 min/max operands and invert them all:
https://alive2.llvm.org/ce/z/7gYMN5
2021-08-19 08:41:38 -04:00
Sanjay Patel e10c3beca5 [InstCombine] add one-use check for min/max fold with not operands; NFC
This makes the intrinsic logic match the cmp+select idiom folds
just below. It's not clearly a win either way unless we think
that a 'not' op costs more than min/max.

The cmp+select folds on these patterns are more extensive than
the intrinsics currently and may have some complicated interactions,
so I'm trying to make those line up and bring the optimizations
for intrinsics up to parity.
2021-08-19 08:41:38 -04:00
Rosie Sumpter d1aa075129 [LoopFlatten] Fix assertion failure
There is an assertion failure in computeOverflowForUnsignedMul
(used in checkOverflow) due to the inner and outer trip counts
having different types. This occurs when the IV has been widened,
but the loop components are not successfully rediscovered.
This is fixed by some refactoring of the code in findLoopComponents
which identifies the trip count of the loop.

Differential Revision: https://reviews.llvm.org/D108107
2021-08-19 13:18:57 +01:00
Fraser Cormack e6b1ac8546 [LegalizeTypes][VP] Add widening support for binary VP ops
This patch adds the beginnings of more thorough support in the
legalizers for vector-predicated (VP) operations.

The first step is the ability to widen illegal vectors. The more
complicated scenario in which the result/operands need widening but the
mask doesn't has not been handled here. That would require a lot of code
without an in-tree target on which to test it.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D107904
2021-08-19 13:08:47 +01:00
Matthew Devereau 734708e04f [AArch64][SVE] Teach cost model that masked loads/stores are cheap
Reduce the cost of VLS masked loads/stores to make the vectorizor emit them more frequently.
2021-08-19 13:01:33 +01:00
Frederik Gossen c20cb5547d Avoid unused variable when NDEBUG 2021-08-19 13:00:16 +02:00
Bjorn Pettersson 36d5138619 [NewPM] Make some sanitizer passes parameterized in the PassRegistry
Refactored implementation of AddressSanitizerPass and
HWAddressSanitizerPass to use pass options similar to passes like
MemorySanitizerPass. This makes sure that there is a single mapping
from class name to pass name (needed by D108298), and options like
-debug-only and -print-after makes a bit more sense when (despite
that it is the unparameterized pass name that should be used in those
options).

A result of the above is that some pass names are removed in favor
of the parameterized versions:
- "khwasan" is now "hwasan<kernel;recover>"
- "kasan" is now "asan<kernel>"
- "kmsan" is now "msan<kernel>"

Differential Revision: https://reviews.llvm.org/D105007
2021-08-19 12:43:37 +02:00
Andrzej Warzynski dcc6b7b1d5 [OptTable] Refine how `printHelp` treats empty help texts
Currently, `printHelp` behaves differently for options that:
  * do not define `HelpText` (such options _are not printed_), and
  * define its `HelpText` as `HelpText<"">` (such options _are printed_).
In practice, both approaches lead to no help text and `printHelp` should
treat them consistently. This patch addresses that by making
`printHelpt` check the length of the help text to be printed.

All affected tests have been updated accordingly. The option definitions
for llvm-cvtres have been updated with a short description or "Not
  implemented" for options that are ignored by the tool.

Differential Revision: https://reviews.llvm.org/D107557
2021-08-19 09:30:15 +00:00
David Sherwood f4122398e7 [LoopVectorize][AArch64] Enable ordered reductions by default for AArch64
I have added a new TTI interface called enableOrderedReductions() that
controls whether or not ordered reductions should be enabled for a
given target. By default this returns false, whereas for AArch64 it
returns true and we rely upon the cost model to make sensible
vectorisation choices. It is still possible to override the new TTI
interface by setting the command line flag:

  -force-ordered-reductions=true|false

I have added a new RUN line to show that we use ordered reductions by
default for SVE and Neon:

  Transforms/LoopVectorize/AArch64/strict-fadd.ll
  Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll

Differential Revision: https://reviews.llvm.org/D106653
2021-08-19 09:29:40 +01:00
Wenlei He eca03d2768 [CSSPGO] Track and use context-sensitive post-optimization function size to drive global pre-inliner in llvm-profgen
This change enables llvm-profgen to use accurate context-sensitive post-optimization function byte size as a cost proxy to drive global preinline decisions.

To do this, BinarySizeContextTracker is introduced to track function byte size under different inline context during disassembling. In preinliner, we can not query context byte size under switch `context-cost-for-preinliner`. The tracker uses a reverse trie to keep size of functions under different context (callee as parent, caller as child), and it can give best/longest possible matching context size for given input context.

The new size cost is off by default. There're a few TODOs that needs to addressed: 1) avoid dangling string from `Offset2LocStackMap`, which will be addressed in split context work; 2) using inlinee's entry probe to make sure we have correct zero size for inlinee that's completely optimized away after inlining. Some tuning is also needed.

Differential Revision: https://reviews.llvm.org/D108180
2021-08-18 22:50:57 -07:00
luxufan a9095f005f [JITLink] Optimize GOTPCRELX Relocations
This patch optimize the GOTPCRELX Reloations, which is described in X86-64 psabi chapter B.2. And Not all optimization of this chapter is implemented.

1. Convert call and jmp has been implemented
2. Convert mov, but the optimization that when the symbol is defined in the lower 32-bit address space, memory operand in `mov` can be convertted into immediate operand has not been implemented.
3. Conver Test and Binop has not been implemented.

The new test file named ELF_got_plt_optimizations.s has been added, and I moved some test cases about optimization of got/plt from ELF_x86_64_small_pic_relocations.s to the new test file.

By referencing the lld, so, the optimization `Convert call and jmp` is not same as what psabi says, and I have explained it in the comment.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D108280
2021-08-19 10:30:22 +08:00
Peter Collingbourne 6f85225ef3 StackLifetime: Remove asserts for multiple lifetime intrinsics.
According to the langref, it is valid to have multiple consecutive
lifetime start or end intrinsics on the same object.

For llvm.lifetime.start:
"If ptr [...] is a stack object that is already alive, it simply
fills all bytes of the object with poison."

For llvm.lifetime.end:
"Calling llvm.lifetime.end on an already dead alloca is no-op."

However, we currently fail an assertion in such cases. I've observed
the assertion failure when the loop vectorization pass duplicates
the intrinsic.

We can conservatively handle these intrinsics by ignoring all but
the first one, which can be implemented by removing the assertions.

Differential Revision: https://reviews.llvm.org/D108337
2021-08-18 18:45:28 -07:00
Rong Xu 5fdaaf7fd8 [SampleFDO] Flow Sensitive Sample FDO (FSAFDO) profile loader
This patch implements Flow Sensitive Sample FDO (FSAFDO) profile
loader. We have two profile loaders for FS profile,
one before RegAlloc and one before BlockPlacement.

To enable it, when -fprofile-sample-use=<profile> is specified,
add "-enable-fs-discriminator=true \
     -disable-ra-fsprofile-loader=false \
     -disable-layout-fsprofile-loader=false"
to turn on the FS profile loaders.

Differential Revision: https://reviews.llvm.org/D107878
2021-08-18 18:37:35 -07:00
Kyungwoo Lee 829616c241 [NFC][DebugInfo] getDwarfCompileUnitID
This is a refactoring for the use in https://reviews.llvm.org/D108261

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D108271
2021-08-18 17:35:03 -07:00
Jessica Paquette c22b64ef66 [AArch64][GlobalISel] Don't allow s128 for G_ISNAN
getAPFloatFromSize doesn't support s128, so we can't lower this without
asserting right now.

To fix the buildbots, don't allow any scalars other than s16, s32, and s64.
2021-08-18 13:59:00 -07:00
Jessica Paquette 3d91d5b757 [AArch64][GlobalISel] Mark G_FMINNUM/G_FMAXNUM as floating point opcodes
We need to ensure that these end up on FPR to allow imported patterns to
select them.

This will also ensure that we get good regbank selection when dealing with
instructions like G_PHI/G_LOAD/G_STORE which deduce their banks from their
uses/users.

Differential Revision: https://reviews.llvm.org/D108260
2021-08-18 13:32:19 -07:00
Jessica Paquette 45e1a6bd25 [AArch64][GlobalISel] Legalize scalar G_FMINNUM + G_FMAXNUM
For subtargets with full FP16, this is legal for s16, s32, and s64. Without
full FP16, it's legal for s32 and s64.

For s128, this is a libcall.

We also support some vector types, but for now, let's just support scalars.

Differential Revision: https://reviews.llvm.org/D108259
2021-08-18 13:30:03 -07:00