Commit Graph

44488 Commits

Author SHA1 Message Date
Nikita Popov 40bc309911 Revert "[regalloc] Ensure Query::collectInterferringVregs is called before interval iteration"
This reverts commit d40b4911bd.

This causes a large compile-time regression:
https://llvm-compile-time-tracker.com/compare.php?from=0aa637b2037d882ddf7861284169abf63f524677&to=d40b4911bd9aca0573752e065f29ddd9aff280e1&stat=instructions
2021-03-16 20:41:26 +01:00
Mircea Trofin d40b4911bd [regalloc] Ensure Query::collectInterferringVregs is called before interval iteration
The main part of the patch is the change in RegAllocGreedy.cpp: Q.collectInterferringVregs()
needs to be called before iterating the interfering live ranges.

The rest of the patch offers support that is the case: instead of  clearing the query's
InterferingVRegs field, we invalidate it. The clearing happens when the live reg matrix
is invalidated (existing triggering mechanism).

Without the change in RegAllocGreedy.cpp, the compiler ices.

This patch should make it more easily discoverable by developers that
collectInterferringVregs needs to be called before iterating.

I will follow up with a subsequent patch to improve the usability and maintainability of Query.

Differential Revision: https://reviews.llvm.org/D98232
2021-03-16 12:10:10 -07:00
Nick Lewycky b743bbc505 Add ConstantDataVector::getRaw() to create a constant data vector from raw data.
This parallels ConstantDataArray::getRaw() and can be used with ConstantDataSequential::getRawDataValues() in the base class for both types.

Update BuildConstantData{Array,Vector} tests to test the getRaw API. Also removes its unused Module.

In passing, update some comments to include the support for half and bfloat. Update tests to include testing for bfloat.

Differential Revision: https://reviews.llvm.org/D98302
2021-03-16 11:57:53 -07:00
Aaron Puchert 1cb15b10ea Correct Doxygen syntax for inline code
There is no syntax like {@code ...} in Doxygen, @code is a block command
that ends with @endcode, and generally these are not enclosed in braces.
The correct syntax for inline code snippets is @c <code>.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D98665
2021-03-16 15:17:45 +01:00
serge-sans-paille b2e78a061c [NFC] Use SmallString instead of std::string for the AttrBuilder
This avoids a few unnecessary conversion from StringRef to std::string, and a
bunch of extra allocation thanks to the SmallString.

Differential Revision: https://reviews.llvm.org/D98190
2021-03-16 13:34:14 +01:00
Caroline Concatto 3c03635d53 [SVE][LoopVectorize] Add support for scalable vectorization of loops with vector reverse
This patch adds support for reverse loop vectorization.
It is possible to vectorize the following loop:
```
  for (int i = n-1; i >= 0; --i)
    a[i] = b[i] + 1.0;
```
with fixed or scalable vector.
The loop-vectorizer will use 'reverse' on the loads/stores to make
sure the lanes themselves are also handled in the right order.
This patch adds support for scalable vector on IRBuilder interface to
create a reverse vector. The IR function
CreateVectorReverse lowers to experimental.vector.reverse for scalable vector
and keedp the original behavior for fixed vector using shuffle reverse.

Differential Revision: https://reviews.llvm.org/D95363
2021-03-16 07:51:59 +00:00
Bing1 Yu 4f198b0c27 [X86] Pass to transform amx intrinsics to scalar operation.
This pass runs in any situations but we skip it when it is not O0 and the
function doesn't have optnone attribute. With -O0, the def of shape to amx
intrinsics is near the amx intrinsics code. We are not able to find a
point which post-dominate all the shape and dominate all amx intrinsics.
To decouple the dependency of the shape, we transform amx intrinsics
to scalar operation, so that compiling doesn't fail. In long term, we
 should improve fast register allocation to allocate amx register.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D93594
2021-03-16 10:40:22 +08:00
Lang Hames ecf6466f01 [JITLink][MachO][x86-64] Introduce generic x86-64 support.
This patch introduces generic x86-64 edge kinds, and refactors the MachO/x86-64
backend to use these edge kinds. This simplifies the implementation of the
MachO/x86-64 backend and makes it possible to write generic x86-64 passes and
utilities.

The new edge kinds are different from the original set used in the MachO/x86-64
backend. Several edge kinds that were not meaningfully distinguished in that
backend (e.g. the PCRelMinusN edges) have been merged into single edge kinds in
the new scheme (these edge kinds can be reintroduced later if we find a use for
them). At the same time, new edge kinds have been introduced to convey extra
information about the state of the graph. E.g. The Request*AndTransformTo**
edges represent GOT/TLVP relocations prior to synthesis of the GOT/TLVP
entries, and the 'Relaxable' suffix distinguishes edges that are candidates for
optimization from edges which should be left as-is (e.g. to enable runtime
redirection).

ELF/x86-64 will be refactored to use these generic edges at some point in the
future, and I anticipate a similar refactor to create a generic arm64 support
header too.

Differential Revision: https://reviews.llvm.org/D98305
2021-03-15 15:43:07 -07:00
Nick Lewycky 483a253ae9 NFC: Formatting changes.
Run clang-format over these files.

Capitalize some variable names per clang-tidy's request.

Pulled out to simplify review of D98302.
2021-03-15 14:26:39 -07:00
Florian Hahn bb244ea2a8
[AnnotationRemarks] Remove unneeded Function.h include (NFC). 2021-03-15 21:09:35 +00:00
Wenlei He a5d30421a6 [CSSPGO] Load context profile for external functions in PreLink and populate ThinLTO import list
For ThinLTO's prelink compilation, we need to put external inline candidates into an import list attached to function's entry count metadata. This enables ThinLink to treat such cross module callee as hot in summary index, and later helps postlink to import them for profile guided cross module inlining.

For AutoFDO, the import list is retrieved by traversing the nested inlinee functions. For CSSPGO, since profile is flatterned, a few things need to happen for it to work:

 - When loading input profile in extended binary format, we need to load all child context profile whose parent is in current module, so context trie for current module includes potential cross module inlinee.
 - In order to make the above happen, we need to know whether input profile is CSSPGO profile before start reading function profile, hence a flag for profile summary section is added.
 - When searching for cross module inline candidate, we need to walk through the context trie instead of nested inlinee profile (callsite sample of AutoFDO profile).
 - Now that we have more accurate counts with CSSPGO, we swtiched to use entry count instead of total count to decided if an external callee is potentially beneficial to inline. This make it consistent with how we determine whether call tagert is potential inline candidate.

Differential Revision: https://reviews.llvm.org/D98590
2021-03-15 12:22:15 -07:00
Fangrui Song 5d44c92bf8 Change void getNoop(MCInst &NopInst) to MCInst getNop()
Prefer (self-documenting) return values to output parameters (which are
liable to be used).
While here, rename Noop to Nop which is more widely used and improves
consistency with hasEmitNops/setEmitNops/emitNop/etc.
2021-03-15 12:05:34 -07:00
Stelios Ioannou ab86edbc88 [AArch64] Implement __rndr, __rndrrs intrinsics
This patch implements the __rndr and __rndrrs intrinsics to provide access to the random
number instructions introduced in Armv8.5-A. They are only defined for the AArch64
execution state and are available when __ARM_FEATURE_RNG is defined.

These intrinsics store the random number in their pointer argument and return a status
code if the generation succeeded. The difference between __rndr __rndrrs, is that the latter
intrinsic reseeds the random number generator.

The instructions write the NZCV flags indicating the success of the operation that we can
then read with a CSET.

[1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics
[2] https://bugs.llvm.org/show_bug.cgi?id=47838

Differential Revision: https://reviews.llvm.org/D98264

Change-Id: I8f92e7bf5b450e5da3e59943b53482edf0df6efc
2021-03-15 17:51:48 +00:00
Zahira Ammarguellat 80ca4fd154 [NFC] Fix "unused parameter" error revealed in the Linux self-build. 2021-03-15 12:17:11 -04:00
Jan Kratochvil a28facba1c [llvm] [dwarf] Fix DWARFListTableHeader::getOffsetEntry off-by-one
D98289 was erroneously reporting `invalid range list offset 0x20110`
instead of `invalid range list table index 0`.

Differential Revision: https://reviews.llvm.org/D98589
2021-03-14 21:42:44 +01:00
Matt Arsenault d57d8f364f CodeGen: Reorder MachinePointerInfo fields
This saves a little bit of padding.
2021-03-14 10:06:39 -04:00
kuterd 44c1425c17 [Attributor][fix] Remove problematic EXPENSIVE_CHECK
Remove the check that is causing compilation issues in
some build configurations.
2021-03-13 18:03:24 +03:00
Lang Hames 4e30b20bdb [JITLink][ORC] Make the LinkGraph available to modifyPassConfig.
This makes the target triple, graph name, and full graph content available
when making decisions about how to populate the linker pass pipeline.

Also updates the LLJITWithObjectLinkingLayerPlugin example to show more
API use, including use of the API changes in this patch.
2021-03-12 18:42:51 -08:00
Hubert Tong 4f9cc1512d Revert "[AsmParser][SystemZ][z/OS] Introducing HLASM Comment Syntax"
This reverts commit bcdd40f802.
See https://reviews.llvm.org/D98543.
2021-03-12 14:48:00 -05:00
Zahira Ammarguellat c2006f857d [NFC] Fix "unused parameter" error revealed in the Linux self-build. 2021-03-12 10:26:40 -08:00
Anirudh Prasad bcdd40f802 [AsmParser][SystemZ][z/OS] Introducing HLASM Comment Syntax
- This patch adds in support for the ordinary HLASM comment syntax asm
  statements (Reference - Chapter 7, Comment Statements, Ordinary Comment
  Statements)
- In brief, the ordinary comment syntax if used, must begin with the "*"
  character
- To achieve this, this patch makes use of the CommentString attribute
  provided in the base MCAsmInfo class
- In the SystemZMCAsmInfo class, the CommentString attribute was set to
  "*" based on the assembler dialect
- Furthermore, a new attribute RestrictCommentString, is provided to only
  treat a string as a comment if it appears at the start of the asm
  statement. Example: "jo *-4" is valid in HLASM (jump back 4 bytes from
  current point - similar to jo -4 in gnu asm) and we don't want "*-4" to
  be treated as a comment.
- RFC for HLASM Parser support implementation: https://lists.llvm.org/pipermail/llvm-dev/2021-January/147686.html

Reviewed By: scott.linder, Kai

Differential Revision: https://reviews.llvm.org/D97703
2021-03-12 11:56:11 -05:00
Matt Arsenault 6b76d82853 GlobalISel: Fix marking byval arguments as immutable
byval arguments need to be assumed writable. Only implicitly stack
passed arguments which aren't addressable in the IR can be assumed
immutable.

Mips is still broken since for some reason its doing its own thing
with the ValueHandlers (and x86 doesn't actually handle byval
arguments now, although some of the code is there).
2021-03-12 09:01:53 -05:00
serge-sans-paille bc4a5bdce4 [NFC] Use StringRef instead of const char* for AsmPrinter
This avoids calling strlen to repeatedly compute some string size.
2021-03-12 14:39:25 +01:00
Hans Wennborg f50aef745c Revert "[InstrProfiling] Don't generate __llvm_profile_runtime_user"
This broke the check-profile tests on Mac, see comment on the code
review.

> This is no longer needed, we can add __llvm_profile_runtime directly
> to llvm.compiler.used or llvm.used to achieve the same effect.
>
> Differential Revision: https://reviews.llvm.org/D98325

This reverts commit c7712087cb.

Also reverting the dependent follow-up commit:

Revert "[InstrProfiling] Generate runtime hook for ELF platforms"

> When using -fprofile-list to selectively apply instrumentation only
> to certain files or functions, we may end up with a binary that doesn't
> have any counters in the case where no files were selected. However,
> because on Linux and Fuchsia, we pass -u__llvm_profile_runtime, the
> runtime would still be pulled in and incur some non-trivial overhead,
> especially in the case when the continuous or runtime counter relocation
> mode is being used. A better way would be to pull in the profile runtime
> only when needed by declaring the __llvm_profile_runtime symbol in the
> translation unit only when needed.
>
> This approach was already used prior to 9a041a7522, but we changed it
> to always generate the __llvm_profile_runtime due to a TAPI limitation.
> Since TAPI is only used on Mach-O platforms, we could use the early
> emission of __llvm_profile_runtime there, and on other platforms we
> could change back to the earlier approach where the symbol is generated
> later only when needed. We can stop passing -u__llvm_profile_runtime to
> the linker on Linux and Fuchsia since the generated undefined symbol in
> each translation unit that needed it serves the same purpose.
>
> Differential Revision: https://reviews.llvm.org/D98061

This reverts commit 87fd09b25f.
2021-03-12 13:53:46 +01:00
Serguei Katkov cfe8f8e0f0 Revert "Mark gc.relocate and gc.result as readnone"
As readnone function they become movable and LICM can hoist them
out of a loop. As a result in LCSSA form phi node of type token
is created. No one is ready that GCRelocate first operand is phi node
but expects to be token.

GVN test were also updated, it seems it does not do what is expected.
Test for LICM is also added.

This reverts commit f352463ade.
2021-03-12 16:59:17 +07:00
Johannes Doerfert 0fe0d114e4 Revert "[OpenMP] Introduce the `disable_selector_propagation` variant selector trait"
Need to revert ad9e98b8ef which this
commit depends on.

This reverts commit f771ef7b5f0ed260d00931cd50e6fe462edbacaf.
2021-03-11 23:48:35 -06:00
Johannes Doerfert b2642456ab [OpenMP] Introduce the `disable_selector_propagation` variant selector trait
Nested `omp [begin|end] declare variant` inherit the selectors from
surrounding `omp (begin|end) declare variant` constructs. To stop such
propagation the user can add the `disable_selector_propagation` to the
`extension` set in the `implementation` selector.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D95765
2021-03-11 23:31:25 -06:00
Carl Ritson c07f2025e4 [AMDGPU] Restrict image_msaa_load to MSAA dimension types
This instruction is only valid on 2D MSAA and 2D MSAA Array
surfaces.  Remove intrinsic support for other dimension types,
and block assembly for unsupported dimensions.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D98397
2021-03-12 09:47:24 +09:00
Florian Hahn c92ec0dd92
[Matrix] Add support for matrix-by-scalar division.
This patch extends the matrix spec to allow matrix-by-scalar division.

Originally support for `/` was left out to avoid ambiguity for the
matrix-matrix version of `/`, which could either be elementwise or
specified as matrix multiplication M1 * (1/M2).

For the matrix-scalar version, no ambiguity exists; `*` is also
an elementwise operation in that case. Matrix-by-scalar division
is commonly supported by systems including Matlab, Mathematica
or NumPy.

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D97857
2021-03-11 22:21:23 +00:00
Wenlei He 051f2c144e [SamplePGO] Skip inlinee profile scaling for sample loader inlining
For CGSCC inline, we need to scale down a function's branch weights and entry counts when thee it's inlined at a callsite. This is done through updateCallProfile. Additionally, we also scale the weigths for the inlined clone based on call site count in updateCallerBFI. Neither is needed for inlining during sample profile loader as it's using context profile that is separated from inlinee's own profile. This change skip the inlinee profile scaling for sample loader inlining.

Differential Revision: https://reviews.llvm.org/D98187
2021-03-11 10:18:26 -08:00
David Green fad70c3068 [ARM] Improve WLS lowering
Recently we improved the lowering of low overhead loops and tail
predicated loops, but concentrated first on the DLS do style loops. This
extends those improvements over to the WLS while loops, improving the
chance of lowering them successfully. To do this the lowering has to
change a little as the instructions are terminators that produce a value
- something that needs to be treated carefully.

Lowering starts at the Hardware Loop pass, inserting a new
llvm.test.start.loop.iterations that produces both an i1 to control the
loop entry and an i32 similar to the llvm.start.loop.iterations
intrinsic added for do loops. This feeds into the loop phi, properly
gluing the values together:

  %wls = call { i32, i1 } @llvm.test.start.loop.iterations.i32(i32 %div)
  %wls0 = extractvalue { i32, i1 } %wls, 0
  %wls1 = extractvalue { i32, i1 } %wls, 1
  br i1 %wls1, label %loop.ph, label %loop.exit
...
loop:
  %lsr.iv = phi i32 [ %wls0, %loop.ph ], [ %iv.next, %loop ]
  ..
  %iv.next = call i32 @llvm.loop.decrement.reg.i32(i32 %lsr.iv, i32 1)
  %cmp = icmp ne i32 %iv.next, 0
  br i1 %cmp, label %loop, label %loop.exit

The llvm.test.start.loop.iterations need to be lowered through ISel
lowering as a pair of WLS and WLSSETUP nodes, which each get converted
to t2WhileLoopSetup and t2WhileLoopStart Pseudos. This helps prevent
t2WhileLoopStart from being a terminator that produces a value,
something difficult to control at that stage in the pipeline. Instead
the t2WhileLoopSetup produces the value of LR (essentially acting as a
lr = subs rn, 0), t2WhileLoopStart consumes that lr value (the Bcc).

These are then converted into a single t2WhileLoopStartLR at the same
point as t2DoLoopStartTP and t2LoopEndDec. Otherwise we revert the loop
to prevent them from progressing further in the pipeline. The
t2WhileLoopStartLR is a single instruction that takes a GPR and produces
LR, similar to the WLS instruction.

  %1:gprlr = t2WhileLoopStartLR %0:rgpr, %bb.3
  t2B %bb.1
...
bb.2.loop:
  %2:gprlr = PHI %1:gprlr, %bb.1, %3:gprlr, %bb.2
  ...
  %3:gprlr = t2LoopEndDec %2:gprlr, %bb.2
  t2B %bb.3

The t2WhileLoopStartLR can then be treated similar to the other low
overhead loop pseudos, eventually being lowered to a WLS providing the
branches are within range.

Differential Revision: https://reviews.llvm.org/D97729
2021-03-11 17:56:19 +00:00
Hiroshi Yamauchi 365b225d46 [PGO] Fix two issues in PGOMemOPSizeOpt.
1. PGOMemOPSizeOpt grabs only the first, up to five (by default) entries from
the value profile metadata and preserves the remaining entries for the fallback
memop call site. If there are more than five entries, the rest of the entries
would get dropped. This is fine for PGOMemOPSizeOpt itself as it only promotes
up to 3 (by default) values, but potentially not for other downstream passes
that may use the value profile metadata.

2. PGOMemOPSizeOpt originally assumed that only values 0 through 8 are kept
track of. When the range buckets were introduced, it was changed to skip the
range buckets, but since it does not grab all entries (only five), if some range
buckets exist in the first five entries, it could potentially cause fewer
promotion opportunities (eg. if 4 out of 5 were range buckets, it may be able to
promote up to one non-range bucket, as opposed to 3.) Also, combined with 1, it
means that wrong entries may be preserved, as it didn't correctly keep track of
which were entries were skipped.

To fix this, PGOMemOPSizeOpt now grabs all the entries (up to the maximum number
of value profile buckets), keeps track of which entries were skipped, and
preserves all the remaining entries.

Differential Revision: https://reviews.llvm.org/D97592
2021-03-11 09:53:05 -08:00
Nikita Popov 6312c53870 [IRBuilder] Deprecate CreateLoad APIs with implicit type
These APIs are not compatible with opaque pointers. Deprecate them
to avoid the introduction of further uses.
2021-03-11 18:46:45 +01:00
Craig Topper e9426dfbae [ValueTypes][RISCV] Add MVT for v1f16.
RISCV makes all fixed vector MVTs with size less than or equal
to a command line option legal.

This didn't include v1f16 because it was missing but did include v1f32 and v1f64.

One test is affected where we did test this type, but it is a horizontal
reduction so it is non-sensical. Perhaps we should canonicalize that
away somewhere.

I'm not sure if we should be making v1 types legal, but this will at
least make RISCV consistent across all types.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98365
2021-03-11 09:23:18 -08:00
Joseph Huber 807466ef28 [OpenMP] Restore backwards compatibility for libomptarget
Summary:
The changes introduced in D87946 changed the API for libomptarget
functions. `__kmpc_push_target_tripcount` was a function in Clang 11.x
but was not given a backward-compatible interface. This change will
require people using Clang 13.x or 12.x to recompile their offloading
programs.

Reviewed By: jdoerfert cchen

Differential Revision: https://reviews.llvm.org/D98358
2021-03-11 09:52:11 -05:00
Stephen Tozer f40976bd01 Revert "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands"
This reverts commit c0f3dfb9f1.

Reverted due to an error on the clang-x64-windows-msvc buildbot.
2021-03-11 14:48:01 +00:00
Simon Pilgrim cc48b45d24 [llvm-mca] Fix uninitialized variable in InOrderIssueStage constructor warning. NFCI. 2021-03-11 14:41:20 +00:00
Simon Pilgrim 9a259f4386 [Transforms] SampleProfileLoaderBaseImpl<BT>::getFunctionLoc - fix Wdocumentation warnings. NFCI. 2021-03-11 14:04:08 +00:00
gbtozers c0f3dfb9f1 [DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands
This patch improves salvageDebugInfoImpl by allowing it to salvage arithmetic
operations with two or more non-const operands; this includes the GetElementPtr
instruction, and most Binary Operator instructions. These salvages produce
DIArgList locations and are only valid for dbg.values, as currently variadic
DIExpressions must use DW_OP_stack_value. This functionality is also only added
for salvageDebugInfoForDbgValues; other functions that directly call
salvageDebugInfoImpl (such as in ISel or Coroutine frame building) can be
updated in a later patch.

Differential Revision: https://reviews.llvm.org/D91722
2021-03-11 13:33:49 +00:00
Serguei Katkov 0480927712 [Statepoint Lowering] Handle the case with several gc.result
Recently gc.result has been marked with readnone instead of readonly and
this opens a door for different optimization to duplicate gc.result.
Statepoint lowering is not ready to see several gc.results.
The problem appears when there are gc.results with one located in the same
basic block and another located in other basic block.
In this case we need both export VR and fill local setValue.

Note that this case is not sufficient optimization done before CodeGen.
It is evident that local gc.result dominates all other gc.results and it is handled
by GVN and EarlyCSE.

But anyway, even if IR is not optimal Backend should not crash on a valid IR.

Reviewers: reames, dantrushin
Reviewed By: dantrushin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D98393
2021-03-11 18:44:44 +07:00
Simon Pilgrim e74d626925 [IPO] Fix EXPENSIVE_CHECKS assert added at D83744. NFCI.
It wasn't taking into account that QueryingAA was a pointer.
2021-03-11 10:29:15 +00:00
Nikita Popov 403da6a69a Reapply [LICM] Make promotion faster
Relative to the previous implementation, this always uses
aliasesUnknownInst() instead of aliasesPointer() to correctly
handle atomics. The added test case was previously miscompiled.

-----

Even when MemorySSA-based LICM is used, an AST is still populated
for scalar promotion. As the AST has quadratic complexity, a lot
of time is spent in this step despite the existing access count
limit. This patch optimizes the identification of promotable stores.

The idea here is pretty simple: We're only interested in must-alias
mod sets of loop invariant pointers. As such, only populate the AST
with loop-invariant loads and stores (anything else is definitely
not promotable) and then discard any sets which alias with any of
the remaining, definitely non-promotable accesses.

If we promoted something, check whether this has made some other
accesses loop invariant and thus possible promotion candidates.

This is much faster in practice, because we need to perform AA
queries for O(NumPromotable^2 + NumPromotable*NumNonPromotable)
instead of O(NumTotal^2), and NumPromotable tends to be small.
Additionally, promotable accesses have loop invariant pointers,
for which AA is cheaper.

This has a signicant positive compile-time impact. We save ~1.8%
geomean on CTMark at O3, with 6% on lencod in particular and 25%
on individual files.

Conceptually, this change is NFC, but may not be so in practice,
because the AST is only an approximation, and can produce
different results depending on the order in which accesses are
added. However, there is at least no impact on the number of promotions
(licm.NumPromoted) in test-suite O3 configuration with this change.

Differential Revision: https://reviews.llvm.org/D89264
2021-03-11 10:50:28 +01:00
Djordje Todorovic 9f41c03f82 [Debugify][OriginalDIMode] Export the report into JSON file
By using the original-di check with debugify in the combination with
the llvm/utils/llvm-original-di-preservation.py it becomes very user
friendly tool. An example of the HTML page with the issues
related to debug info can be found at [0].

[0] https://djolertrk.github.io/di-checker-html-report-example/

Differential Revision: https://reviews.llvm.org/D82546
2021-03-11 01:11:13 -08:00
Petr Hosek c7712087cb [InstrProfiling] Don't generate __llvm_profile_runtime_user
This is no longer needed, we can add __llvm_profile_runtime directly
to llvm.compiler.used or llvm.used to achieve the same effect.

Differential Revision: https://reviews.llvm.org/D98325
2021-03-10 22:33:51 -08:00
Reid Kleckner b69db4a7ab Re-land "[PDB] Defer relocating .debug$S until commit time and parallelize it"
This reverts commit bacf9cf2c5 and
reinstates commit 1a9bd5b813.

Reverting this commit did not appear to make the problem go away, so we
can go ahead and reland it.
2021-03-10 15:14:09 -08:00
kuterd d75c9e61a5 [Attributor] Attributor call site specific AAValueConstantRange
This patch makes uses of the context bridges introduced in D83299 to make
AAValueConstantRange call site specific.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D83744
2021-03-11 01:19:44 +03:00
Quentin Colombet 49942c6d4a [NFC] Fix a compiler warning
Fix a warning caused by -Wrange-loop-analysis

Patch by Xiaoqing Wu <xiaoqing_wu@apple.com>

Differential Revision: https://reviews.llvm.org/D98297
2021-03-10 13:28:53 -08:00
Alexey Lapshin 4f16e177e1 [llvm-objcopy][NFC] replace class Buffer/MemBuffer/FileBuffer with streams.
During D88827 it was requested to remove the local implementation
of Memory/File Buffers:

// TODO: refactor the buffer classes in LLVM to enable us to use them here
// directly.

This patch uses raw_ostream instead of Buffers. Generally, using streams
could allow us to reduce memory usages. No need to load all data into the
memory - the data could be streamed through a smaller buffer.
Thus, this patch uses raw_ostream as an interface for output data:

Error executeObjcopyOnBinary(CopyConfig &Config,
                             object::Binary &In,
                             raw_ostream &Out);

Note 1. This patch does not change the implementation of Writers
so that data would be directly stored into raw_ostream.
This is assumed to be done later.

Note 2. It would be better if Writers would be implemented in a such way
that data could be streamed without seeking/updating. If that would be
inconvenient then raw_ostream could be replaced with raw_pwrite_stream
to have a possibility to seek back and update file headers.
This is assumed to be done later if necessary.

Note 3. Current FileOutputBuffer allows using a memory-mapped file.
The raw_fd_ostream (which could be used if data should be stored in the file)
does not allow us to use a memory-mapped file. Memory map functionality
could be implemented for raw_fd_ostream:

It is possible to add resize() method into raw_ostream.

class raw_ostream {
  void resize(uint64_t size);
}

That method, implemented for raw_fd_ostream, could create a memory-mapped file.
The streamed data would be written into that memory file then.
Thus we would be able to use memory-mapped files with raw_fd_ostream.
This is assumed to be done later if necessary.

Differential Revision: https://reviews.llvm.org/D91028
2021-03-10 23:50:04 +03:00
Sriraman Tallam 0ba1ebcbb7 Remove original implementation of UniqueInternalLinkageNames pass.
D96109 was recently submitted which contains the refactored implementation of
-funique-internal-linakge-names by adding the unique suffixes in clang rather
than as an LLVM pass. Deleting the former implementation in this change.

Differential Revision: https://reviews.llvm.org/D98234
2021-03-10 11:57:40 -08:00
Quentin Colombet 66dab2fa84 [NFC] Fix compiler warnings
Fix warnings caused by -Wrange-loop-analysis.

Patch by Xiaoqing Wu <xiaoqing_wu@apple.com>

Differential Revision: https://reviews.llvm.org/D98298
2021-03-10 11:03:50 -08:00
Craig Topper 9106d04554 [RISCV][SelectionDAG] Introduce an ISD::SPLAT_VECTOR_PARTS node that can represent a splat of 2 i32 values into a nxvXi64 vector for riscv32.
On riscv32, i64 isn't a legal scalar type but we would like to
support scalable vectors of i64.

This patch introduces a new node that can represent a splat made
of multiple scalar values. I've used this new node to solve the current
crashes we experience when getConstant is used after type legalization.

For RISCV, we are now default expanding SPLAT_VECTOR to SPLAT_VECTOR_PARTS
when needed and then handling the SPLAT_VECTOR_PARTS later during
LegalizeOps. I've remove the special case I previously put in for
ABS for D97991 as the default expansion is now able to succesfully
use getConstant.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98004
2021-03-10 09:46:18 -08:00
Craig Topper 0c73a506e8 [RISCV] Starting fixing issues that prevent us from testing vXi64 intrinsics on RV32.
Currently we crash in type legalization any time an intrinsic
uses a scalar i64 on RV32.

This patch adds support for type legalizing this to prevent
crashing. I don't promise that it uses the best possible codegen
just that it is functional.

This first version handles 3 cases. vmv.v.x intrinsic, vmv.s.x
intrinsic and intrinsics that take a scalar input, splat it and
then do some operation.

For vmv.v.x we'll either rely on hardware sign extension for
constants or we'll convert it to multiple splats and bit
manipulation.

For vmv.s.x we use a really unoptimal sequence inspired by what
we do for an INSERT_VECTOR_ELT.

For the third case we'll either try to use the .vi form for
constants or convert to a complicated splat and bitmanip and use
the .vv form of the operation.

I've renamed the ExtendOperand field to SplatOperand now use it
specifically for the third case. The first two cases are handled
by custom lowering specifically for those intrinsics.

I haven't updated all tests yet, but I tried to cover a subset
that includes single-width, widening, and narrowing.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D97895
2021-03-10 09:45:38 -08:00
Ta-Wei Tu 7ff2768be1 Revert "[LoopInterchange] Replace tightly-nesting-ness check with the one from `LoopNest`"
This reverts commit df9158c9a4.
2021-03-11 01:24:43 +08:00
Stephen Tozer 1db137b185 [DebugInfo] Handle DBG_VALUES with multiple variable location operands in MIR
This patch adds handling for DBG_VALUE_LIST in the MIR-passes (after
finalize-isel), excluding the debug liveness passes and DWARF emission. This
most significantly affects MachineSink, which now needs to consider all used
registers of a debug value when sinking, but for most passes this change is
simply replacing getDebugOperand(0) with an iteration over all debug operands.

Differential Revision: https://reviews.llvm.org/D92578
2021-03-10 17:15:24 +00:00
Jingu Kang 25951c5ab8 [AArch64] Add missing intrinsics for scalar FP rounding
Differential Revision: https://reviews.llvm.org/D98269
2021-03-10 13:22:29 +00:00
Christudasan Devadasan 4c6ab48fb1 GlobalISel: Try to combine G_[SU]DIV and G_[SU]REM
It is good to have a combined `divrem` instruction when the
`div` and `rem` are computed from identical input operands.
Some targets can lower them through a single expansion that
computes both division and remainder. It effectively reduces
the number of instructions than individually expanding them.

Reviewed By: arsenm, paquette

Differential Revision: https://reviews.llvm.org/D96013
2021-03-10 18:46:07 +05:30
Alex Richardson 35bf23e965 Avoid shuffle self-assignment in EXPENSIVE_CHECKS builds
Some versions of libstdc++ perform self-assignment in std::shuffle. This
breaks the EXPENSIVE_CHECKS builds of TableGen due to an incorrect assertion
in libstdc++.

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85828.

Fixes https://llvm.org/PR37652

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D98167
2021-03-10 11:17:34 +00:00
Vladislav Vinogradov dc8446c2a0 [ADT][NFC] Use `size_t` type for index in `indexed_accessor_range`
It makes it consistent with `size()` method return type and with
STL-like containers API.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D97921
2021-03-10 11:40:59 +03:00
Wei Mi ee35784a90 [SampleFDO] Support enabling -funique-internal-linkage-name.
now -funique-internal-linkage-name flag is available, and we want to flip
it on by default since it is beneficial to have separate sample profiles
for different internal symbols with the same name. As a preparation, we
want to avoid regression caused by the flip.

When we flip -funique-internal-linkage-name on, the profile is collected
from binary built without -funique-internal-linkage-name so it has no uniq
suffix, but the IR in the optimized build contains the suffix. This kind of
mismatch may introduce transient regression.

To avoid such mismatch, we introduce a NameTable section flag indicating
whether there is any name in the profile containing uniq suffix. Compiler
will decide whether to keep uniq suffix during name canonicalization
depending on the NameTable section flag. The flag is only available for
extbinary format. For other formats, by default compiler will keep uniq
suffix so they will only experience transient regression when
-funique-internal-linkage-name is just flipped.

Another type of regression is caused by places where we miss to call
getCanonicalFnName. Those places are fixed.

Differential Revision: https://reviews.llvm.org/D96932
2021-03-09 21:41:40 -08:00
Amara Emerson 55e760769b [GlobalISel] Fold away G_BUILD_VECTOR with all elements extracted.
If every element is extracted from a G_BUILD_VECTOR, pass through the source
registers. This is different to the extract(build_vector) combine because this
one tolerates multiple users as long as they're exhaustive.

Differential Revision: https://reviews.llvm.org/D97890
2021-03-09 11:34:26 -08:00
Amara Emerson e60ab72137 [AArch64][GlobalISel] Add combine for extract_vector_elt(build_vector, cst)
Differential Revision: https://reviews.llvm.org/D97835
2021-03-09 11:08:02 -08:00
Fangrui Song 42e3f97a9d [MC] Change ELFOSABI_NONE to ELFOSABI_GNU for SHF_GNU_RETAIN
GNU ld does not give SHF_GNU_RETAIN GC root semantics for ELFOSABI_NONE.
(https://sourceware.org/pipermail/binutils/2021-March/115581.html)

This allows GNU ld to interpret SHF_GNU_RETAIN and avoids a gold quirk
https://sourceware.org/bugzilla/show_bug.cgi?id=27490

Because ELFObjectWriter is in an anonymous namespace, I have to place
`markGnuAbi` in the parent MCObjectWriter.

Differential Revision: https://reviews.llvm.org/D97976
2021-03-09 09:59:47 -08:00
Nikita Popov 55ae279ba7 [FastISel] Don't trivially kill extractvalues (PR49467)
All extractvalues of the same value at the same index will map to
the same register, so even if one specific extractvalue only has
one use, we should not mark it as a trivial kill, as there may be
more extractvalues later.

Fixes https://bugs.llvm.org/show_bug.cgi?id=49467.

Differential Revision: https://reviews.llvm.org/D98145
2021-03-09 18:46:38 +01:00
gbtozers f0513413c7 [DebugInfo] Add replaceArg function to simplify DBG_VALUE_LIST expressions
The LiveDebugValues and LiveDebugVariables implementations for handling
DBG_VALUE_LIST instructions can be simplified significantly if they do not have
to deal with any duplicated operands, such as a DBG_VALUE_LIST that uses the
same register multiple times in its expression. This patch adds a function,
replaceArg, that can be used to simplify a DIExpression in the case of
duplicated operands.

Differential Revision: https://reviews.llvm.org/D83896
2021-03-09 17:41:04 +00:00
Dave Lee 736afe465f Revert "[build][modules] Fix ObjCARCUtil.h modularization"
This reverts commit f1b690598e.
2021-03-09 09:36:47 -08:00
gbtozers df69c69427 [DebugInfo] Handle multiple variable location operands in IR
This patch updates the various IR passes to correctly handle dbg.values with a
DIArgList location. This patch does not actually allow DIArgLists to be produced
by salvageDebugInfo, and it does not affect any pass after codegen-prepare.
Other than that, it should cover every IR pass.

Most of the changes simply extend code that operated on a single debug value to
operate on the list of debug values in the style of any_of, all_of, for_each,
etc. Instances of setOperand(0, ...) have been replaced with with
replaceVariableLocationOp, which takes the value that is being replaced as an
additional argument. In places where this value isn't readily available, we have
to track the old value through to the point where it gets replaced.

Differential Revision: https://reviews.llvm.org/D88232
2021-03-09 16:44:38 +00:00
Stefan Gränitz 265bc5af7b [Orc] Always check mapped sections for ELFDebugObject are in bounds of working memory buffer
As stated in the JITLink user guide: Do not assume that the input object is well formed.
https://llvm.org/docs/JITLink.html#tips-for-jitlink-backend-developers
2021-03-09 14:01:50 +01:00
Cullen Rhodes 2750f3ed31 [IR] Introduce llvm.experimental.vector.splice intrinsic
This patch introduces a new intrinsic @llvm.experimental.vector.splice
that constructs a vector of the same type as the two input vectors,
based on a immediate where the sign of the immediate distinguishes two
variants. A positive immediate specifies an index into the first vector
and a negative immediate specifies the number of trailing elements to
extract from the first vector.

For example:

  @llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, 1) ==> <B, C, D, E>  ; index
  @llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, -3) ==> <B, C, D, E> ; trailing element count

These intrinsics support both fixed and scalable vectors, where the
former is lowered to a shufflevector to maintain existing behaviour,
although while marked as experimental the recommended way to express
this operation for fixed-width vectors is to use shufflevector. For
scalable vectors where it is not possible to express a shufflevector
mask for this operation, a new ISD node has been implemented.

This is one of the named shufflevector intrinsics proposed on the
mailing-list in the RFC at [1].

Patch by Paul Walker and Cullen Rhodes.

[1] https://lists.llvm.org/pipermail/llvm-dev/2020-November/146864.html

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D94708
2021-03-09 10:44:22 +00:00
gbtozers 93b170ea24 [DebugInfo] Handle dbg.values with multiple variable location operands in ISel
This patch adds partial support in Instruction Selection for dbg.values that use
a DIArgList. This patch does not add support for producing DBG_VALUE_LIST, but
adds the logic for processing DIArgLists within the ISel pass. This change is
largely focused on handleDebugValue and some of the functions that it calls.
Outside of this, salvageDebugInfo and transferDbgValues have been modified to
replace individual operands instead of the entire value; dangling debug info for
variadic debug values is not currently supported (but may be added later).

Differential Revision: https://reviews.llvm.org/D88589
2021-03-09 09:48:03 +00:00
Jan Kratochvil 4289a7f1d7 llvm-dwarfdump: Fix DWARF-5 DW_FORM_implicit_const (used by GCC)
Differential Revision: https://reviews.llvm.org/D98195
2021-03-09 09:26:58 +01:00
Akira Hatanaka dca5737945 Move ObjCARCUtil.h back to llvm/Analysis
Instead of adding the header to llvm/IR, just duplicate the marker
string in the auto upgrader.
2021-03-08 16:35:24 -08:00
Rahman Lavaee c245c21c43 [llvm-readelf] Support dumping the BB address map section with --bb-addr-map.
This patch lets llvm-readelf dump the content of the BB address map
section in the following format:
```
Function {
  At: <address>
  BB entries [
    {
      Offset:   <offset>
      Size:     <size>
      Metadata: <metadata>
    },
    ...
  ]
}
...
```

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D95511
2021-03-08 16:20:11 -08:00
Dave Lee f1b690598e [build][modules] Fix ObjCARCUtil.h modularization
This change was introduced by this fix:
a2a55def35.

This makes a submodule for ObjCARCUtil.h, because leaving it in
LLVM_intrinsic_gen causes a dependency cycle between LLVM_IR and
LLVM_intrinsic_gen.
2021-03-08 15:51:27 -08:00
Benjamin Kramer 0d96ea0792 [ValueTracking] Move matchSimpleRecurrence out of line
The header only has a forward declaration of PHINode available, and this
function doesn't seem to get much out of inlining.
2021-03-09 00:04:47 +01:00
Sanjay Patel 34d0d644ff [ValueTracking] move/add helper to get inverse min/max; NFC
We will need to this functionality to improve min/max folds
in instcombine when we canonicalize to intrinsics.
2021-03-08 17:38:22 -05:00
wlei c460ef61d6 [CSSPGO][llvm-profgen] Change sample count of dangling probe in llvm-profgen
Differential Revision: https://reviews.llvm.org/D96811
2021-03-08 14:36:02 -08:00
Masoud Ataei 820f508b08 [PowerPC] Removing _massv place holder
Since P8 is the oldest machine supported by MASSV pass,
_massv place holder is removed and the oldest version of
MASSV functions is assumed. If the P9 vector specific is
detected in the compilation process, the P8 prefix will
be updated to P9.

Differential Revision: https://reviews.llvm.org/D98064
2021-03-08 21:43:24 +00:00
Jessica Paquette 5c26be214d [AArch64][GlobalISel] Lower G_BUILD_VECTOR -> G_DUP
If we have

```
%vec = G_BUILD_VECTOR %reg, %reg, ..., %reg
```

Then lower it to

```
%vec = G_DUP %reg
```

Also update the selector to handle constant splats on G_DUP.

This will not combine when the splat is all zeros or ones. Tablegen-imported
patterns rely on these being G_BUILD_VECTOR.

Minor code size improvements on CTMark at -Os.

Also adds some utility functions to make it a bit easier to recognize splats,
and an AArch64-specific splat helper.

Differential Revision: https://reviews.llvm.org/D97731
2021-03-08 13:01:10 -08:00
Alina Sbirlea 29482426b5 Revert "[LICM] Make promotion faster"
Revert 3d8f842712
Revision triggers a miscompile sinking a store incorrectly outside a
threading loop. Detected by tsan.
Reverting while investigating.

Differential Revision: https://reviews.llvm.org/D89264
2021-03-08 12:53:03 -08:00
Min-Yih Hsu 8dddc15297 [M68k](4/8) MC layer and object file support
- Add the M68k-specific MC layer implementation
 - Add ELF support for M68k
 - Add M68k-specifc CC and reloc

TODO: Currently AsmParser and disassembler are not implemented yet.
Please use this bug to track the status:
https://bugs.llvm.org/show_bug.cgi?id=48976

Authors: myhsu, m4yers, glaubitz

Differential Revision: https://reviews.llvm.org/D88390
2021-03-08 12:30:57 -08:00
Min-Yih Hsu bec7b16692 [M68k](3/8) Skeleton and target description files
- Infrastructure for the target (i.e. build files, target triple etc.)
 - All of the target description TableGen file

Authors: myhsu, m4yers, glaubitz

Differential Revision: https://reviews.llvm.org/D88389
2021-03-08 12:30:57 -08:00
Min-Yih Hsu 6dcc325ce0 [M68k][MIR](2/8) Changes in the target-independent MIR part
- Add new callback in `TargetInstrInfo` --
  `isPCRelRegisterOperandLegal` -- to query whether pc-rel
   register MachineOperand is legal.
 - Add new function to search DebugLoc in a reverse ordering

Authors: myhsu, m4yers, glaubitz

Differential Revision: https://reviews.llvm.org/D88386
2021-03-08 12:30:57 -08:00
Min-Yih Hsu 503343191e [M68k][TableGen](1/8) TableGen related changes
- Add a new TableGen backend: CodeBeads
 - Add support to generate logical operand information

For the first item, it is currently a workaround of M68k's (complex)
instruction encoding. A typical architecture, especially CISC one like
X86, normally uses `MCInstrDesc::TSFlags` to carry instruction encoding
info. However, at the early days of M68k backend development, we found
it difficult to fit every possible encoding into the 64-bit
`MCInstrDesc::TSFlags`. Therefore CodeBeads was invented to provide
an alternative, arbitrary length container for instruciton encoding
info. However, in the long term we incline not to use a new TG
backend for less common pattern like what we encountered in M68k. A bug
has been created to host to discussion on migrating from CodeBeads to
more concise solution: https://bugs.llvm.org/show_bug.cgi?id=48792

The second item was also served for similar purpose. It created utility
functions that tell you the index of a `MachineOperand` in a
`MachineInst` given a logical operand index. In normal cases a logical
operand is the same as `MachineOperand`, but for operands using complex
addressing mode a logical operand might be consisting of multiple
`MachineOperand`. The TableGen-ed `getLogicalOperandIdx`, for instance,
can give you the mapping between these two concepts. Nevertheless, we
hope to remove this feature in the future if possible. Since it's not
really useful for the targets supported by LLVM now either.

Authors: myhsu, m4yers, glaubitz

Differential Revision: https://reviews.llvm.org/D88385
2021-03-08 12:30:56 -08:00
Victor Huang 621023b218 [AIX][TLS] Add assert check of valid csect type for the storage mapping class XCOFF::XMC_UL
This patch adds the assert check inside the constructor for the csect (MCSectionXCOFF) to ensure
valid csect type used for the storage mappping class XCOFF:XMC_UL.
2021-03-08 14:18:57 -06:00
Yuta Saito aa0c571a5f [WebAssembly] Add new relocation for location relative data
This `R_WASM_MEMORY_ADDR_SELFREL_I32` relocation represents an offset
between its relocating address and the symbol address. It's very similar
to `R_X86_64_PC32` but restricted to be used for only data segments.

```
S + A - P
```

A: Represents the addend used to compute the value of the relocatable
field.
P: Represents the place of the storage unit being relocated.
S: Represents the value of the symbol whose index resides in the
relocation entry.

Proposal: https://github.com/WebAssembly/tool-conventions/issues/162

Differential Revision: https://reviews.llvm.org/D96659
2021-03-08 11:34:10 -08:00
Philip Reames d9a29a6752 constify getUnderlyingObject implementation [nfc] 2021-03-08 11:32:54 -08:00
Philip Reames 239a618180 [instcombine] Collapse trivial and recurrences
If we have a recurrence of the form <Start, And, Step> we know that the value taken by the recurrence stabilizes on the first iteration (provided step is loop invariant). We can exploit that fact to remove the loop carried dependence in the recurrence.

Differential Revision: https://reviews.llvm.org/D97578 (and part)
2021-03-08 09:21:38 -08:00
Philip Reames 97a7bc5831 [gvn] Precisely propagate equalities to phi operands
The code used for propagating equalities (e.g. assume facts) was conservative in two ways - one of which this patch fixes. Specifically, it shifts the code reasoning about whether a use is dominated by the end of the assume block to consider phi uses to exist on the predecessor edge. This matches the dominator tree handling for dominates(Edge, Use), and simply extends it to dominates(BB, Use).

Note that the decision to use the end of the block is itself a conservative choice. The more precise option would be to use the later of the assume and the value, and replace all uses after that. GVN handles that case separately (with the replace operand mechanism) because it used to be expensive to ask dominator questions within blocks. With the new instruction ordering support, we should probably rewrite this code at some point to simplify.

Differential Revision: https://reviews.llvm.org/D98082
2021-03-08 08:59:00 -08:00
gbtozers e5d958c456 [DebugInfo] Support DIArgList in DbgVariableIntrinsic
This patch updates DbgVariableIntrinsics to support use of a DIArgList for the
location operand, resulting in a significant change to its interface. This patch
does not update all IR passes to support multiple location operands in a
dbg.value; the only change is to update the DbgVariableIntrinsic interface and
its uses. All code outside of the intrinsic classes assumes that an intrinsic
will always have exactly one location operand; they will still support
DIArgLists, but only if they contain exactly one Value.

Among other changes, the setOperand and setArgOperand functions in
DbgVariableIntrinsic have been made private. This is to prevent code from
setting the operands of these intrinsics directly, which could easily result in
incorrect/invalid operands being set. This does not prevent these functions from
being called on a debug intrinsic at all, as they can still be called on any
CallInst pointer; it is assumed that any code directly setting the operands on a
generic call instruction is doing so safely. The intention for making these
functions private is to prevent DIArgLists from being overwritten by code that's
naively trying to replace one of the Values it points to, and also to fail fast
if a DbgVariableIntrinsic is updated to use a DIArgList without a valid
corresponding DIExpression.
2021-03-08 14:36:13 +00:00
Ta-Wei Tu df9158c9a4 [LoopInterchange] Replace tightly-nesting-ness check with the one from `LoopNest`
The check `tightlyNested()` in `LoopInterchange` is similar to the one in `LoopNest`.
In fact, the former misses some cases where loop-interchange is not feasible and results in incorrect behaviour.
Replacing it with the much robust version provided by `LoopNest` reduces code duplications and fixes https://bugs.llvm.org/show_bug.cgi?id=48113.

`LoopInterchange` has a weaker definition of tightly or perfectly nesting-ness than the one implemented in `LoopNest::arePerfectlyNested()`.
Therefore, `tightlyNested()` is instead implemented with `LoopNest::checkLoopsStructure` and additional checks for unsafe instructions.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D97290
2021-03-08 11:36:08 +08:00
Mehdi Amini fe9a4b55da Fix build post-revert in 8d5a981a13
One commit introduced after the reverted change was using an API
introduced there, this is reintroducing the API, but not the original
broken change.
2021-03-08 00:59:05 +00:00
Mehdi Amini 8d5a981a13 Revert "[SimplifyCFG] Update FoldBranchToCommonDest to be poison-safe"
This reverts commit 99108c791d.
Clang is miscompiling LLVM with this change, a stage-2 build hits
multiple failures.

As a repro, I built clang in a stage1 directory and used it this way:

cmake -G Ninja ../llvm \
  -DCMAKE_CXX_COMPILER=`pwd`/../build-stage1/bin/clang++ \
  -DCMAKE_C_COMPILER=`pwd`/../build-stage1/bin/clang \
  -DLLVM_TARGETS_TO_BUILD="X86;NVPTX;AMDGPU" \
  -DLLVM_ENABLE_PROJECTS=mlir \
  -DLLVM_BUILD_EXAMPLES=ON \
  -DCMAKE_BUILD_TYPE=Release \
  -DLLVM_ENABLE_ASSERTIONS=On
ninja check-mlir
2021-03-08 00:15:47 +00:00
Juneyoung Lee 99108c791d [SimplifyCFG] Update FoldBranchToCommonDest to be poison-safe
This patch makes FoldBranchToCommonDest merge branch conditions into `select i1` rather than `and/or i1` when it is called by SimplifyCFG.
It is known that merging conditions into and/or is poison-unsafe, and this is towards making things *more* correct by removing possible miscompilations.
Currently, InstCombine simply consumes these selects into and/or of i1 (which is also unsafe), so the visible effect would be very small. The unsafe select -> and/or transformation will be removed in the future.
There has been efforts for updating optimizations to support the select form as well, and they are linked to D93065.

The safe transformation is fired when it is called by SimplifyCFG only. This is done by setting the new `PoisonSafe` argument as true.
Another place that calls FoldBranchToCommonDest is LoopSimplify. `PoisonSafe` flag is set to false in this case because enabling it has a nontrivial impact in performance because SCEV is more conservative with select form and InductiveRangeCheckElimination isn't aware of select form of and/or i1.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D95026
2021-03-08 01:38:03 +09:00
Fangrui Song bb6732cf62 [MC] Add parseEOL() overload and migrate some parseToken(AsmToken::EndOfStatement) to parseEOL()
For many directives, the following diagnostics

* `error: unexpected token`
* `error: unexpected token in '.abort' directive"`

are replaced with `error: expected newline`.

`unexpected token` may make the user think a different token is needed.
`expected newline` is clearer about the expected token.

For `in '...' directive`, the directive name is not useful because the next line
replicates the error line which includes the directive.
2021-03-06 17:45:23 -08:00
Fangrui Song d96af2ed2d [MC] Support .symver *, *, remove
As a resolution to https://sourceware.org/bugzilla/show_bug.cgi?id=25295 , GNU as
from binutils 2.35 supports the optional third argument for the .symver directive.

'remove' for a non-default version is useful:
`.symver def_v1, def@v1, remove` => def_v1 is not retained in the symbol table.
Previously the user has to strip the original symbol or specify a `local:`
version node in a version script to localize the symbol.

`.symver def, def@@v1, remove` and `.symver def, def@@@v1, remove` are supported
as well, though they are identical to `.symver def, def@@@v1`.

local/hidden are not useful so this patch does not implement them.
2021-03-06 15:23:02 -08:00
Vy Nguyen f8b01d54c3 Reland 293e8fa13d
[llvm-exegesis] Disable the LBR check on AMD

    https://bugs.llvm.org/show_bug.cgi?id=48918

    The bug reported a hang (or very very slow runtime) on a Zen2. Unfortunately, we don't have the hardware right now to debug it and I was not able to reproduce the bug on a HSW.
    Theory we've got is that the lbr-checking code could be confused on AMD.

    Differential Revision: https://reviews.llvm.org/D97504

New change:
 - Surround usages of x86 helper in llvm-exegesis/X86/Target.cpp with ifdef
 - Fix bug which caused the caller of getVendorSignature to not have a copy of EAX that it expected.
2021-03-05 13:23:42 -05:00
Philip Reames f352463ade Mark gc.relocate and gc.result as readnone
For some reason, we had been marking gc.relocates as reading memory. There's no known reason for this, and I suspect it to be a legacy of very early implementation conservatism.  gc.relocate and gc.result are simply projections of the return values from the associated statepoint.  Note that the LangRef has always declared them readnone.

The EarlyCSE change is simply moving the special casing from readonly to readnone handling.

As noted by the test diffs, this does allow some additional CSE when relocates are separated by stores, but since we generate gc.relocates in batches, this is unlikely to help anything in practice.

This was reviewed as part of https://reviews.llvm.org/D97974, but split at reviewer request before landing.  The motivation is to enable the GVN changes in that patch.
2021-03-05 10:07:17 -08:00
gbtozers 65600cb2a7 [DebugInfo] Add DIArgList MD to store multple values in DbgVariableIntrinsics
This patch adds a new metadata node, DIArgList, which contains a list of SSA
values. This node is in many ways similar in function to the existing
ValueAsMetadata node, with the difference being that it tracks a list instead of
a single value. Internally, it uses ValueAsMetadata to track the individual
values, but there is also a reasonable amount of DIArgList-specific
value-tracking logic on top of that. Similar to ValueAsMetadata, it is a special
case in parsing and printing due to the fact that it requires a function state
(as it may reference function-local values).

This patch should not result in any immediate functional change; it allows for
DIArgLists to be parsed and printed, but debug variable intrinsics do not yet
recognize them as a valid argument (outside of parsing).

Differential Revision: https://reviews.llvm.org/D88175
2021-03-05 17:02:24 +00:00
Stephen Tozer f677413071 Reapply "[DebugInfo] Add new instruction and DIExpression operator for variadic debug values"
Rewrites test to use correct architecture triple; fixes incorrect
reference in SourceLevelDebugging doc; simplifies `spillReg` behaviour
so as to not be dependent on changes elsewhere in the patch stack.

This reverts commit d2000b45d0.
2021-03-05 12:32:05 +00:00
Jingu Kang 9b302513f6 [AArch64] Add missing intrinsics for vrnd 2021-03-05 11:26:12 +00:00
Simon Pilgrim 6955524c2f Fix Wdocumentation unknown parameter warning. NFCI. 2021-03-05 11:24:44 +00:00
Simon Pilgrim 3fd2fa1220 Revert rG8198d83965ba4b9db6922b44ef3041030b2bac39: "[X86] Pass to transform amx intrinsics to scalar operation."
This reverts commit 8198d83965ba4b9db6922b44ef3041030b2bac39.due to buildbot breakages
2021-03-05 11:09:14 +00:00
Andy Wingo a5a3659de7 [WebAssembly][yaml2obj][obj2yaml] Elem sections for nonzero tables
With reference types, tables can have non-zero table numbers.  This
commit adds support for element sections against these tables.

Differential Revision: https://reviews.llvm.org/D97923
2021-03-05 11:45:15 +01:00
Petar Avramovic d44f61f81c Reland [GlobalISel] Combine zext(trunc x) to x
Recommit 4112299ee7. Depends on
4c8fb7ddd6 which was reverted.

Combine zext(trunc x) to x when truncated bits are known to be zero.

Differential Revision: https://reviews.llvm.org/D96031
2021-03-05 11:05:37 +01:00
Luo, Yuanke 8198d83965 [X86] Pass to transform amx intrinsics to scalar operation.
This pass runs in any situations but we skip it when it is not O0 and the
function doesn't have optnone attribute. With -O0, the def of shape to amx
intrinsics is near the amx intrinsics code. We are not able to find a
point which post-dominate all the shape and dominate all amx intrinsics.
To decouple the dependency of the shape, we transform amx intrinsics
to scalar operation, so that compiling doesn't fail. In long term, we
 should improve fast register allocation to allocate amx register.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D93594
2021-03-05 16:02:02 +08:00
Fangrui Song 063b19dea6 [DebugInfo] Delete unused DIVariable::getSource 2021-03-04 21:53:13 -08:00
Fangrui Song 9e28b89827 [DebugInfo] Delete deleted getLine/getColumn
r250405 deleted the functions from DILexicalBlockBase.
2021-03-04 21:49:21 -08:00
Michael Kruse b119120673 [clang][OpenMP] Use OpenMPIRBuilder for workshare loops.
Initial support for using the OpenMPIRBuilder by clang to generate loops using the OpenMPIRBuilder. This initial support is intentionally limited to:
 * Only the worksharing-loop directive.
 * Recognizes only the nowait clause.
 * No loop nests with more than one loop.
 * Untested with templates, exceptions.
 * Semantic checking left to the existing infrastructure.

This patch introduces a new AST node, OMPCanonicalLoop, which becomes parent of any loop that has to adheres to the restrictions as specified by the OpenMP standard. These restrictions allow OMPCanonicalLoop to provide the following additional information that depends on base language semantics:
 * The distance function: How many loop iterations there will be before entering the loop nest.
 * The loop variable function: Conversion from a logical iteration number to the loop variable.

These allow the OpenMPIRBuilder to act solely using logical iteration numbers without needing to be concerned with iterator semantics between calling the distance function and determining what the value of the loop variable ought to be. Any OpenMP logical should be done by the OpenMPIRBuilder such that it can be reused MLIR OpenMP dialect and thus by flang.

The distance and loop variable function are implemented using lambdas (or more exactly: CapturedStmt because lambda implementation is more interviewed with the parser). It is up to the OpenMPIRBuilder how they are called which depends on what is done with the loop. By default, these are emitted as outlined functions but we might think about emitting them inline as the OpenMPRuntime does.

For compatibility with the current OpenMP implementation, even though not necessary for the OpenMPIRBuilder, OMPCanonicalLoop can still be nested within OMPLoopDirectives' CapturedStmt. Although OMPCanonicalLoop's are not currently generated when the OpenMPIRBuilder is not enabled, these can just be skipped when not using the OpenMPIRBuilder in case we don't want to make the AST dependent on the EnableOMPBuilder setting.

Loop nests with more than one loop require support by the OpenMPIRBuilder (D93268). A simple implementation of non-rectangular loop nests would add another lambda function that returns whether a loop iteration of the rectangular overapproximation is also within its non-rectangular subset.

Reviewed By: jdenny

Differential Revision: https://reviews.llvm.org/D94973
2021-03-04 22:52:59 -06:00
Wei Mi 2357d29335 [SampleFDO] Another fix to prevent repeated indirect call promotion in
sample loader pass.

In https://reviews.llvm.org/rG5fb65c02ca5e91e7e1a00e0efdb8edc899f3e4b9,
to prevent repeated indirect call promotion for the same indirect call
and the same target, we used zero-count value profile to indicate an
indirect call has been promoted for a certain target. We removed
PromotedInsns cache in the same patch. However, there was a problem in
that patch described below, and that problem led me to add PromotedInsns
back as a mitigation in
https://reviews.llvm.org/rG4ffad1fb489f691825d6c7d78e1626de142f26cf.

When we get value profile from metadata by calling getValueProfDataFromInst,
we need to specify the maximum possible number of values we expect to read.
We uses MaxNumPromotions in the last patch so the maximum number of value
information extracted from metadata is MaxNumPromotions. If we have many
values including zero-count values when we write the metadata, some of them
will be dropped when we read them because we only read MaxNumPromotions
values. It will allow repeated indirect call promotion again. We need to
make sure if there are values indicating promoted targets, those values need
to be saved in metadata with higher priority than other values.

The patch fixed that problem. We change to use -1 to represent the count
of a promoted target instead of 0 so it is easier to sort the values.
When we prepare to update the metadata in updateIDTMetaData, we will sort
the values in the descending count order and extract only MaxNumPromotions
values to write into metadata. Since -1 is the max uint64_t number, if we
have equal to or less than MaxNumPromotions of -1 count values, they will
all be kept in metadata. If we have more than MaxNumPromotions of -1 count
values, we will only save MaxNumPromotions such values maximally. In such
case, we have logic in place in doesHistoryAllowICP to guarantee no more
promotion in sample loader pass will happen for the indirect call, because
it has been promoted enough.

With this change, now we can remove PromotedInsns without problem.

Differential Revision: https://reviews.llvm.org/D97350
2021-03-04 18:44:12 -08:00
Chen Zheng 87bbf3d1f8 [XCOFF][DebugInfo] support DWARF for XCOFF for assembly output.
Reviewed By: jasonliu

Differential Revision: https://reviews.llvm.org/D95518
2021-03-04 21:07:52 -05:00
David Blaikie a2a55def35 Move llvm/Analysis/ObjCARCUtil.h to IR to fix layering.
This is included from IR files, and IR doesn't/can't depend on Analysis
(because Analysis depends on IR).

Also fix the implementation - don't use non-member static in headers, as
it leads to ODR violations, inaccurate "unused function" warnings, etc.
And fix the header protection macro name (we don't generally include
"LIB" in the names, so far as I can tell).
2021-03-04 16:14:53 -08:00
Francis Visoiu Mistrih 365b78396a [Remarks] Emit variable info in auto-init remarks
This enhances the auto-init remark with information about the variable
that is auto-initialized.

This is based of debug info if available, or alloca names (mostly for
development purposes).

```
auto-init.c:4:7: remark: Call to memset inserted by -ftrivial-auto-var-init. Memory operation size: 4096 bytes.Variables: var (4096 bytes). [-Rpass-missed=annotation-remarks]
  int var[1024];
      ^
```

This allows to see things like partial initialization of a variable that
the optimizer won't be able to completely remove.

Differential Revision: https://reviews.llvm.org/D97734
2021-03-04 12:51:22 -08:00
Nicolas Guillemot 6b8cf7356c Revert "[Support] Add raw_ostream_iterator: ostream_iterator for raw_ostream"
This reverts commit 7479a2e00b.

This commit causes compile errors on clang-x64-windows-msvc, so I'm
reverting the patch for now.

For reference, the error in question is:

```
error C2280: 'llvm::raw_ostream_iterator<char,char>
&llvm::raw_ostream_iterator<char,char>::operator =(const
llvm::raw_ostream_iterator<char,char> &)': attempting to reference a deleted
function

note: compiler has generated 'llvm::raw_ostream_iterator<char,char>::operator ='
here

note: 'llvm::raw_ostream_iterator<char,char>
&llvm::raw_ostream_iterator<char,char>::operator =(const
llvm::raw_ostream_iterator<char,char> &)': function was implicitly deleted
because 'llvm::raw_ostream_iterator<char,char>' has a data member
'llvm::raw_ostream_iterator<char,char>::OutStream' of reference type
```
2021-03-04 12:46:58 -08:00
Akira Hatanaka 1900503595 [ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of
explicitly emitting retainRV or claimRV calls in the IR

This reapplies ed4718eccb, which was reverted
because it was causing a miscompile. The bug that was causing the miscompile
has been fixed in 75805dce5f.

Original commit message:

Background:

This fixes a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.attachedcall" to calls,
  which indicates the call is implicitly followed by a marker
  instruction and an implicit retainRV/claimRV call that consumes the
  call result. In addition, it emits a call to
  @llvm.objc.clang.arc.noop.use, which consumes the call result, to
  prevent the middle-end passes from changing the return type of the
  called function. This is currently done only when the target is arm64
  and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  claimRV is attached to the call since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since the ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if retainRV is attached to the call and
  does nothing if claimRV is attached to it.

- SCCP refrains from replacing the return value of a call with a
  constant value if the call has the operand bundle. This ensures the
  call always has at least one user (the call to
  @llvm.objc.clang.arc.noop.use).

- This patch also fixes a bug in replaceUsesOfNonProtoConstant where
  multiple operand bundles of the same kind were being added to a call.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-03-04 11:22:30 -08:00
Nicolas Guillemot 7479a2e00b [Support] Add raw_ostream_iterator: ostream_iterator for raw_ostream
Adds a class `raw_ostream_iterator` that behaves like
std::ostream_iterator, but can be used with raw_ostream.
This is useful for using raw_ostream with std algorithms.

For example, it can be used to output std containers as follows:

```
std::vector<int> V = { 1, 2, 3 };
std::copy(V.begin(), V.end(), raw_ostream_iterator<int>(outs(), ", "));
// Output: "1, 2, 3, "
```

The API tries to follow std::ostream_iterator as closely as is
practically possible.

Reviewed By: dblaikie, mkitzan

Differential Revision: https://reviews.llvm.org/D78795
2021-03-04 11:12:48 -08:00
Adrian Prantl d268febc56 Improve the debug info for coro-split .resume functions
This patch updates the scope line to point to the suspend point. This
makes the first address in the function point to the first source line
in the resume function rather than the function declaration. Without
this the line table "jumps" from the beginning of the function to the
suspend point at the beginning.

rdar://73386346

Differential Revision: https://reviews.llvm.org/D97345
2021-03-04 11:05:35 -08:00
Albion Fung 36192790d8 [PowerPC][PC Rel] Implement option to omit Power10 instructions from stubs
Implemented the option to omit Power10 instructions from save stubs via the
option --no-power10-stubs or --power10-stubs=no on lld. --power10-stubs= will
override the other option. --power10-stubs=auto also exists to use the default
behaviour (ie allow Power10 instructions in stubs).

Differential Revision: https://reviews.llvm.org/D94627
2021-03-04 13:27:46 -05:00
Nico Weber 76148caa50 Revert "[llvm-exegesis] Disable the LBR check on AMD"
This reverts commit 293e8fa13d.
Breaks build on non-intel hosts, see e.g.
http://45.33.8.238/macm1/4600/step_3.txt
2021-03-04 11:48:33 -05:00
Xiangling Liao e9f9ec837d [CMake][AIX] Adjust plugin library extension used on AIX
As stated in the CMake manual, we are supposed to use MODULE rules to generate
plugin libraries:

"MODULE libraries are plugins that are not linked into other targets but may be
loaded dynamically at runtime using dlopen-like functionality"

Besides, LLVM's plugin infrastructure fits with the AIX treatment of .so
shared objects more than it fits with the AIX treatment of .a library archives
(which may contain shared objects).

Differential revision: https://reviews.llvm.org/D96282
2021-03-04 11:23:06 -05:00
Vy Nguyen 293e8fa13d [llvm-exegesis] Disable the LBR check on AMD
https://bugs.llvm.org/show_bug.cgi?id=48918

The bug reported a hang (or very very slow runtime) on a Zen2. Unfortunately, we don't have the hardware right now to debug it and I was not able to reproduce the bug on a HSW.
Theory we've got is that the lbr-checking code could be confused on AMD.

Differential Revision: https://reviews.llvm.org/D97504
2021-03-04 11:16:38 -05:00
Sanjay Patel 36a489d194 [Analysis][LoopVectorize] rename "Unsafe" variables/methods; NFC
Similar to b3a33553ae, but this shows a TODO and a potential
miscompile is already present.

We are tracking an FP instruction that does *not* have FMF (reassoc)
properties, so calling that "Unsafe" seems opposite of the common
reading.

I also removed one getter method by rolling the null check into
the access. Further simplification may be possible.

The motivation is to clean up the interactions between FMF and
function-level attributes in these classes and their callers.

The new test shows that there is an existing bug somewhere in
the callers. We assumed that the original code was fully 'fast'
and so we produced IR with 'fast' even though it was just 'reassoc'.
2021-03-04 10:40:26 -05:00
Nico Weber 59beb1ef6d Revert "[GlobalISel] Combine zext(trunc x) to x"
This reverts commit 4112299ee7.
Seems to depend on 4c8fb7ddd6 which
is being reverted.
2021-03-04 10:13:40 -05:00
Petar Avramovic 4112299ee7 [GlobalISel] Combine zext(trunc x) to x
Combine zext(trunc x) to x when truncated bits are known to be zero.

Differential Revision: https://reviews.llvm.org/D96031
2021-03-04 15:05:23 +01:00
Sanjay Patel b3a33553ae [Analysis][LoopVectorize] rename "Unsafe" variables/methods; NFC
We are tracking an FP instruction that does *not* have FMF (reassoc)
properties, so calling that "Unsafe" seems opposite of the common
reading.

I also removed one getter method by rolling the null check into
the access. Further simplification seems possible.

The motivation is to clean up the interactions between FMF and
function-level attributes in these classes and their callers.
2021-03-04 08:53:04 -05:00
Stephen Tozer d2000b45d0 Revert "[DebugInfo] Add new instruction and DIExpression operator for variadic debug values"
This reverts commit d07f106f4a.
2021-03-04 11:59:21 +00:00
gbtozers d07f106f4a [DebugInfo] Add new instruction and DIExpression operator for variadic debug values
This patch adds a new instruction that can represent variadic debug values,
DBG_VALUE_VAR. This patch alone covers the addition of the instruction and a set
of basic code changes in MachineInstr and a few adjacent areas, but does not
correctly handle variadic debug values outside of these areas, nor does it
generate them at any point.

The new instruction is similar to the existing DBG_VALUE instruction, with the
following differences: the operands are in a different order, any number of
values may be used in the instruction following the Variable and Expression
operands (these are referred to in code as “debug operands”) and are indexed
from 0 so that getDebugOperand(X) == getOperand(X+2), and the Expression in a
DBG_VALUE_VAR must use the DW_OP_LLVM_arg operator to pass arguments into the
expression.

The new DW_OP_LLVM_arg operator is only valid in expressions appearing in a
DBG_VALUE_VAR; it takes a single argument and pushes the debug operand at the
index given by the argument onto the Expression stack. For example the
sub-expression `DW_OP_LLVM_arg, 0` has the meaning “Push the debug operand at
index 0 onto the expression stack.”

Differential Revision: https://reviews.llvm.org/D82363
2021-03-04 11:45:35 +00:00
Andrew Savonichev d791695cb5 [MCA] Add support for in-order CPUs
This patch adds a pipeline to support in-order CPUs such as ARM
Cortex-A55.

In-order pipeline implements a simplified version of Dispatch,
Scheduler and Execute stages as a single stage. Entry and Retire
stages are common for both in-order and out-of-order pipelines.

Differential Revision: https://reviews.llvm.org/D94928
2021-03-04 14:08:19 +03:00
Fraser Cormack c907681b07 [NFC] Fix typos in CallingConvLower.h 2021-03-04 10:29:14 +00:00
Hongtao Yu c75da238b4 [CSSPGO] Deduplicating dangling pseudo probes.
Same dangling probes are redundant since they all have the same semantic that is to rely on the counts inference tool to get reasonable count for the same original block. Therefore, there's no need to keep multiple copies of them. I've seen jump threading created tons of redundant dangling probes that slowed down the compiler dramatically. Other optimization passes can also result in redundant probes though without an observed impact so far.

This change removes block-wise redundant dangling probes specifically introduced by jump threading. To support removing redundant dangling probes caused by all other passes, a final function-wise deduplication is also added.

An 18% size win of the .pseudo_probe section was seen for SPEC2017. No performance difference was observed.

Differential Revision: https://reviews.llvm.org/D97482
2021-03-03 22:44:42 -08:00
Hongtao Yu 8985515822 [CSSPGO] Unblocking optimizations by dangling pseudo probes.
This change fixes a couple places where the pseudo probe intrinsic blocks optimizations because they are not naturally removable. To unblock those optimizations, the blocking pseudo probes are moved out of the original blocks and tagged dangling, instead of allowing pseudo probes to be literally removed. The reason is that when the original block is removed, we won't be able to sample it. Instead of assigning it a zero weight, moving all its pseudo probes into another block and marking them dangling should allow the counts inference a chance to assign them a more reasonable weight. We have not seen counts quality degradation from our experiments.

The optimizations being unblocked are:

	1. Removing conditional probes for if-converted branches. Conditional probes are tagged dangling when their homing branch arms are folded so that they will not be over-counted.
	2. Unblocking jump threading from removing empty blocks. Pseudo probe prevents jump threading from removing logically empty blocks that only has one unconditional jump instructions.
	3. Unblocking SimplifyCFG and MIR tail duplicate to thread empty blocks and blocks with redundant branch checks.

Since dangling probes are logically deleted, they should not consume any samples in LTO postLink. This can be achieved by setting their distribution factors to zero when dangled.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D97481
2021-03-03 22:44:42 -08:00
Hongtao Yu ad2a59f584 [CSSPGO] Introducing dangling pseudo probes.
Dangling probes are the probes associated to an empty block. This usually happens when all real instructions are optimized away from the block. There is a problem with dangling probes during the offline counts processing. The way the sample profiler works is that samples collected on the first physical instruction following a probe will be counted towards the probe. This logically equals to treating the instruction next to a probe as if it is from the same block of the probe. In the dangling probe case, the real instruction following a dangling probe actually starts a new block, and samples collected on the new block may cause issues when counted towards the empty block.

To mitigate this issue, we first try to move around a dangling probe inside its owning block. If there are still native instructions preceding the probe in the same block, we can then use them as a place holder to collect samples for the probe. A pass is added to walk each block backwards looking for probes not followed by any real instruction and moving them before the first real instruction. This is done right before the object emission.

If we are unlucky to find such in-block preceding instructions for a probe, the solution we are taking is to tag such probe as dangling so that the samples reported for them will not be trusted by the compiler. We leave it up to the counts inference algorithm to get such probes a reasonable count. The number `UINT64_MAX` is used to mark sample count as collected for a dangling probe.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D95962
2021-03-03 22:44:41 -08:00
Johannes Doerfert 5b70c12f3e [Attributor] Make DepClass a required argument
We often used a sub-optimal dependence class in the past because we
didn't see the argument. Let's make it explicit so we remember to think
about it.
2021-03-04 00:35:52 -06:00
Johannes Doerfert e592dad82e [Attributor] Fold "TrackDependence" into the DepClassTy enum
We don't need a bool and an enum to express the three options we
currently have. This makes the interface nicer and much easier to
use optional dependencies. Also avoids mistakes where the bool is
false and enum ignored.
2021-03-04 00:35:52 -06:00
Johannes Doerfert f3f88287c5 [Attributor] Use known alignment as lower bound to avoid work
If we know already more than available from a use, we don't need to
invest time on it.
2021-03-04 00:35:52 -06:00
Philip Reames 99f5417346 Sink routine for replacing a operand bundle to CallBase [NFC]
We had equivalent code for both CallInst and InvokeInst, but never cared about the result type.
2021-03-03 12:07:55 -08:00
Fangrui Song a84f4fc0df [InstrProfiling] Place __llvm_prf_vnodes and __llvm_prf_names in llvm.used on ELF
`__llvm_prf_vnodes` and `__llvm_prf_names` are used by runtime but not
referenced via relocation in the translation unit.

With `-z start-stop-gc` (LLD 13 (D96914); GNU ld 2.37 https://sourceware.org/bugzilla/show_bug.cgi?id=27451),
the linker does not let `__start_/__stop_` references retain their sections.

Place `__llvm_prf_vnodes` and `__llvm_prf_names` in `llvm.used` to make
them retained by the linker.

This patch changes most existing `UsedVars` cases to `CompilerUsedVars`
to reflect the ideal state - if the binary format properly supports
section based GC (dead stripping), `llvm.compiler.used` should be sufficient.

`__llvm_prf_vnodes` and `__llvm_prf_names` are switched to `UsedVars`
since we want them to be unconditionally retained by both compiler and linker.

Behaviors on COFF/Mach-O are not affected.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D97649
2021-03-03 11:32:24 -08:00
Philip Reames e6e5ef40cb [basicaa] Fix a latent bug in isGEPBaseAtNegativeOffset
This was pointed out in review of D97520 by Nikita, but existed in the original code as well.

The basic issue is that a decomposed GEP expression describes (potentially) more than one getelementptr.  The "inbounds" derived UB which justifies this aliasing rule requires that the entire offset be composed of "inbounds" geps.  Otherwise, as can be seen in the recently added and changes in this patch test, we can end up with a large commulative offset with only a small sub-offset actually being "inbounds".  If that small sub-offset lies within the object, the result was unsound.

We could potentially be fancier here, but for the moment, simply be conservative when any of the GEPs parsed aren't inbounds.
2021-03-03 08:43:32 -08:00
Philip Reames dd9922c487 [basicaa] Minor indentation fix 2021-03-03 08:33:19 -08:00
Arnold Schwaighofer a42bea211a [coro async] Allow a coro.suspend.async to specify which argument is the context argument
Before we used the same argument as the entry point. The resume partial
function might want to use a different ABI for its context argument

Differential Revision: https://reviews.llvm.org/D97333
2021-03-03 08:27:37 -08:00
Nico Weber 64f5d7e972 Revert "[InstrProfiling] Place __llvm_prf_vnodes and __llvm_prf_names in llvm.used on ELF"
This reverts commit 04c3040f41.
Breaks instrprof-value-merge.c in bootstrap builds.
2021-03-03 10:21:17 -05:00
Hans Wennborg 0a5dd06718 Revert "[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR"
This caused miscompiles of Chromium tests for iOS due clobbering of live
registers. See discussion on the code review for details.

> Background:
>
> This fixes a longstanding problem where llvm breaks ARC's autorelease
> optimization (see the link below) by separating calls from the marker
> instructions or retainRV/claimRV calls. The backend changes are in
> https://reviews.llvm.org/D92569.
>
> https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
>
> What this patch does to fix the problem:
>
> - The front-end adds operand bundle "clang.arc.attachedcall" to calls,
>   which indicates the call is implicitly followed by a marker
>   instruction and an implicit retainRV/claimRV call that consumes the
>   call result. In addition, it emits a call to
>   @llvm.objc.clang.arc.noop.use, which consumes the call result, to
>   prevent the middle-end passes from changing the return type of the
>   called function. This is currently done only when the target is arm64
>   and the optimization level is higher than -O0.
>
> - ARC optimizer temporarily emits retainRV/claimRV calls after the calls
>   with the operand bundle in the IR and removes the inserted calls after
>   processing the function.
>
> - ARC contract pass emits retainRV/claimRV calls after the call with the
>   operand bundle. It doesn't remove the operand bundle on the call since
>   the backend needs it to emit the marker instruction. The retainRV and
>   claimRV calls are emitted late in the pipeline to prevent optimization
>   passes from transforming the IR in a way that makes it harder for the
>   ARC middle-end passes to figure out the def-use relationship between
>   the call and the retainRV/claimRV calls (which is the cause of
>   PR31925).
>
> - The function inliner removes an autoreleaseRV call in the callee if
>   nothing in the callee prevents it from being paired up with the
>   retainRV/claimRV call in the caller. It then inserts a release call if
>   claimRV is attached to the call since autoreleaseRV+claimRV is
>   equivalent to a release. If it cannot find an autoreleaseRV call, it
>   tries to transfer the operand bundle to a function call in the callee.
>   This is important since the ARC optimizer can remove the autoreleaseRV
>   returning the callee result, which makes it impossible to pair it up
>   with the retainRV/claimRV call in the caller. If that fails, it simply
>   emits a retain call in the IR if retainRV is attached to the call and
>   does nothing if claimRV is attached to it.
>
> - SCCP refrains from replacing the return value of a call with a
>   constant value if the call has the operand bundle. This ensures the
>   call always has at least one user (the call to
>   @llvm.objc.clang.arc.noop.use).
>
> - This patch also fixes a bug in replaceUsesOfNonProtoConstant where
>   multiple operand bundles of the same kind were being added to a call.
>
> Future work:
>
> - Use the operand bundle on x86-64.
>
> - Fix the auto upgrader to convert call+retainRV/claimRV pairs into
>   calls with the operand bundles.
>
> rdar://71443534
>
> Differential Revision: https://reviews.llvm.org/D92808

This reverts commit ed4718eccb.
2021-03-03 15:51:40 +01:00
Hans Wennborg ddf43e5130 revert llvm/include/llvm/Analysis/ObjCARCUtil.h part of 1cc558bd4f 2021-03-03 15:50:02 +01:00
Matt Arsenault 78dcff4841 GlobalISel: Add default implementation of assignValueToReg
Refactor insertion of the asserting ops. This enables using them for
AMDGPU.

This code should essentially be the same for every target. Mips, X86
and ARM all have different code there now, but this seems to be an
accident. The assignment functions are called with different types
than they would be in the DAG, so this is all likely an assortment of
hacks to get around that.
2021-03-03 09:29:53 -05:00
Piotr Sobczak 4672bac177 [AMDGPU] Introduce Strict WQM mode
* Add amdgcn_strict_wqm intrinsic.
* Add a corresponding STRICT_WQM machine instruction.
* The semantic is similar to amdgcn_strict_wwm with a notable difference that not all threads will be forcibly enabled during the computations of the intrinsic's argument, but only all threads in quads that have at least one thread active.
* The difference between amdgc_wqm and amdgcn_strict_wqm, is that in the strict mode an inactive lane will always be enabled irrespective of control flow decisions.

Reviewed By: critson

Differential Revision: https://reviews.llvm.org/D96258
2021-03-03 14:19:16 +01:00
Piotr Sobczak c3ce7bae80 [AMDGPU] Rename amdgcn_wwm to amdgcn_strict_wwm
* Introduce the new intrinsic amdgcn_strict_wwm
 * Deprecate the old intrinsic amdgcn_wwm

The change is done for consistency as the "strict"
prefix will become an important, distinguishing factor
between amdgcn_wqm and amdgcn_strictwqm in the future.

The "strict" prefix indicates that inactive lanes do not
take part in control flow, specifically an inactive lane
enabled by a strict mode will always be enabled irrespective
of control flow decisions.

The amdgcn_wwm will be removed, but doing so in two steps
gives users time to switch to the new name at their own pace.

Reviewed By: critson

Differential Revision: https://reviews.llvm.org/D96257
2021-03-03 09:33:57 +01:00
Carl Ritson 2ddac69f98 [AMDGPU] Rename llvm.amdgcn.msaa.load to llvm.amdgcn.msaa.load.x
While the underlying instruction is called image_msaa_load,
the resource must be x component only.
Rename the intrinsic for clarity.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D97829
2021-03-03 17:30:39 +09:00
Victor Huang 1756b2adc9 [AIX][TLS] Generate TLS variables in assembly files
This patch allows generating TLS variables in assembly files on AIX.
Initialized and external uninitialized variables are generated with the
.csect pseudo-op and local uninitialized variables are generated with
the .comm/.lcomm pseudo-ops. The patch also adds a check to
explicitly say that TLS is not yet supported on AIX.

Reviewed by: daltenty, jasonliu, lei, nemanjai, sfertile
Originally patched by: bsaleil
Commandeered by: NeHuang

Differential Revision: https://reviews.llvm.org/D96184
2021-03-02 18:22:48 -06:00
Arthur Eubanks 99f1e86cbb [opt] Error if -debug-pass is specified alongside the new PM
Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D97810
2021-03-02 15:59:28 -08:00
Nikita Popov 29034f3876 [AST] Remove unused Loop member (NFC)
To fix some build bots after D89264.
2021-03-02 23:11:51 +01:00
Nikita Popov 3d8f842712 [LICM] Make promotion faster
Even when MemorySSA-based LICM is used, an AST is still populated
for scalar promotion. As the AST has quadratic complexity, a lot
of time is spent in this step despite the existing access count
limit. This patch optimizes the identification of promotable stores.

The idea here is pretty simple: We're only interested in must-alias
mod sets of loop invariant pointers. As such, only populate the AST
with loop-invariant loads and stores (anything else is definitely
not promotable) and then discard any sets which alias with any of
the remaining, definitely non-promotable accesses.

If we promoted something, check whether this has made some other
accesses loop invariant and thus possible promotion candidates.

This is much faster in practice, because we need to perform AA
queries for O(NumPromotable^2 + NumPromotable*NumNonPromotable)
instead of O(NumTotal^2), and NumPromotable tends to be small.
Additionally, promotable accesses have loop invariant pointers,
for which AA is cheaper.

This has a signicant positive compile-time impact. We save ~1.8%
geomean on CTMark at O3, with 6% on lencod in particular and 25%
on individual files.

Conceptually, this change is NFC, but may not be so in practice,
because the AST is only an approximation, and can produce
different results depending on the order in which accesses are
added. However, there is at least no impact on the number of promotions
(licm.NumPromoted) in test-suite O3 configuration with this change.

Differential Revision: https://reviews.llvm.org/D89264
2021-03-02 22:10:48 +01:00
Amara Emerson 8a316045ed [AArch64][GlobalISel] Enable use of the optsize predicate in the selector.
To do this while supporting the existing functionality in SelectionDAG of using
PGO info, we add the ProfileSummaryInfo and LazyBlockFrequencyInfo analysis
dependencies to the instruction selector pass.

Then, use the predicate to generate constant pool loads for f32 materialization,
if we're targeting optsize/minsize.

Differential Revision: https://reviews.llvm.org/D97732
2021-03-02 12:55:51 -08:00
Sanjay Patel 415c67ba4c [SDAG] allow partial undef vector constants with select->logic folds
This is an enhancement suggested in the original review/commit:
D97730 / 7fce3322a2
2021-03-02 14:29:15 -05:00
Vy Nguyen 9a2e2de15f [lld-macho] Change loadReexport to handle the case where a TAPI re-exports to reference documents nested within other TBD.
Currently, it was delibrately impleneted to not handle this case, but as it has turnt out, we need this feature.
The concrete use case is
       `System/Library/Frameworks/Cocoa.framework/Versions/A/Cocoa` reexports
               /System/Library/Frameworks/AppKit.framework/Versions/C/AppKit , which then rexports
                    /System/Library/PrivateFrameworks/UIFoundation.framework/Versions/A/UIFoundation

The current implemention uses a global currentTopLevelTapi, which is not reset until it finishes loading the whole tree.
This is a problem because if the top-level is set to Cocoa, then when we get to UIFoundation, it will try to find UIFoundation in the current top level, which is Cocoa and will not find it.

The right thing should be:
 - When loading a library from a TBD file, re-exports need to be looked up in the auxiliary documents within the same TBD.
 - When loading from an actual dylib, no additional TBD documents need to be examined.
 - In no case does a re-export mentioned in one TBD file need to be looked up in a document in an auxiliary document from a different TBD file

Differential Revision: https://reviews.llvm.org/D97438
2021-03-02 12:14:31 -05:00
Krzysztof Parzyszek d96b5e606a [TableGen] Add IntrNoMerge as intrinsic property
There is a function attribute 'nomerge' in addition to 'noduplicate'
and 'convergent'. Both 'noduplicate' and 'convergent' have corresponding
intrinsic properties. This patch adds an intrinsic property for the
'nomerge' attribute.

Differential Revision: https://reviews.llvm.org/D96364
2021-03-02 09:04:50 -08:00
dfukalov 6e967834b9 [AA] Cache (optionally) estimated PartialAlias offsets.
For the cases of two clobbering loads and one loaded object is fully contained
in the second `BasicAAResult::aliasGEP` returns just `PartialAlias` that
is actually more common case of partial overlap, it doesn't say anything about
actual overlapping sizes.

AA users such as GVN and DSE have no functionality to estimate aliasing of GEPs
with non-constant offsets. The change stores estimated relative offsets so they
can be used further.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D93529
2021-03-02 19:04:15 +03:00
Tim Northover 888c5c24ca AArch64: report fp16 arithmetic is present for apple-a11 CPU.
AArch64.td got it right, but the target-parser dropped it, leading to missing
feature flags in Clang.
2021-03-02 15:07:18 +00:00
Stefan Gränitz ef2389235c [Orc] Add JITLink debug support plugin for ELF x86-64
Add a new ObjectLinkingLayer plugin `DebugObjectManagerPlugin` and infrastructure to handle creation of `DebugObject`s as well as their registration in OrcTargetProcess. The current implementation only covers ELF on x86-64, but the infrastructure is not limited to that.

The journey starts with a new `LinkGraph` / `JITLinkContext` pair being created for a `MaterializationResponsibility` in ORC's `ObjectLinkingLayer`. It sends a `notifyMaterializing()` notification, which is forwarded to all registered plugins. The `DebugObjectManagerPlugin` aims to create a  `DebugObject` form the provided target triple and object buffer. (Future implementations might create `DebugObject`s from a `LinkGraph` in other ways.) On success it will track it as the pending `DebugObject` for the `MaterializationResponsibility`.

This patch only implements the `ELFDebugObject` for `x86-64` targets. It follows the RuntimeDyld approach for debug object setup: it captures a copy of the input object, parses all section headers and prepares to patch their load-address fields with their final addresses in target memory. It instructs the plugin to report the section load-addresses once they are available. The plugin overrides `modifyPassConfig()` and installs a JITLink post-allocation pass to capture them.

Once JITLink emitted the finalized executable, the plugin emits and registers the `DebugObject`. For emission it requests a new `JITLinkMemoryManager::Allocation` with a single read-only segment, copies the object with patched section load-addresses over to working memory and triggers finalization to target memory. For registration, it notifies the `DebugObjectRegistrar` provided in the constructor and stores the previously pending`DebugObject` as registered for the corresponding MaterializationResponsibility.

The `DebugObjectRegistrar` registers the `DebugObject` with the target process. `llvm-jitlink` uses the `TPCDebugObjectRegistrar`, which calls `llvm_orc_registerJITLoaderGDBWrapper()` in the target process via `TargetProcessControl` to emit a `jit_code_entry` compatible with the GDB JIT interface [1]. So far the implementation only supports registration and no removal. It appears to me that it wouldn't raise any new design questions, so I left this as an addition for the near future.

[1] https://sourceware.org/gdb/current/onlinedocs/gdb/JIT-Interface.html

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D97335
2021-03-02 15:07:35 +01:00
Stefan Gränitz 48c2acff0c [JITLink] LinkGraph::getName() can be const 2021-03-02 15:07:34 +01:00
Jan Svoboda 4545813b17 [clang][cli] NFC: Rename marshalling multiclass
The new name drops `String` from `MarshallingInfoStringInt`, which follows the naming convention of other marshalling multiclasses.
2021-03-02 11:53:40 +01:00
Yuanfang Chen 5de2d189e6 [Diagnose] Unify MCContext and LLVMContext diagnosing
The situation with inline asm/MC error reporting is kind of messy at the
moment. The errors from MC layout are not reliably propagated and users
have to specify an inlineasm handler separately to get inlineasm
diagnose. The latter issue is not a correctness issue but could be improved.

* Kill LLVMContext inlineasm diagnose handler and migrate it to use
  DiagnoseInfo/DiagnoseHandler.
* Introduce `DiagnoseInfoSrcMgr` to diagnose SourceMgr backed errors. This
  covers use cases like inlineasm, MC, and any clients using SourceMgr.
* Move AsmPrinter::SrcMgrDiagInfo and its instance to MCContext. The next step
  is to combine MCContext::SrcMgr and MCContext::InlineSrcMgr because in all
  use cases, only one of them is used.
* If LLVMContext is available, let MCContext uses LLVMContext's diagnose
  handler; if LLVMContext is not available, MCContext uses its own default
  diagnose handler which just prints SMDiagnostic.
* Change a few clients(Clang, llc, lldb) to use the new way of reporting.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D97449
2021-03-01 15:58:37 -08:00
Matt Arsenault 0131498402 GlobalISel: Remove dead code
Generic code should probably not introduce G_INSERT/G_EXTRACT. The
mirror unpackRegs should also be removed, but AMDGPU still has a use
remaining which needs to be fixed.
2021-03-01 17:06:43 -05:00
Fangrui Song 04c3040f41 [InstrProfiling] Place __llvm_prf_vnodes and __llvm_prf_names in llvm.used on ELF
`__llvm_prf_vnodes` and `__llvm_prf_names` are used by runtime but not
referenced via relocation in the translation unit.

With `-z start-stop-gc` (D96914 https://sourceware.org/bugzilla/show_bug.cgi?id=27451),
the linker no longer lets `__start_/__stop_` references retain them.

Place `__llvm_prf_vnodes` and `__llvm_prf_names` in `llvm.used` to make
them retained by the linker.

This patch changes most existing `UsedVars` cases to `CompilerUsedVars`
to reflect the ideal state - if the binary format properly supports
section based GC (dead stripping), `llvm.compiler.used` should be sufficient.

`__llvm_prf_vnodes` and `__llvm_prf_names` are switched to `UsedVars`
since we want them to be unconditionally retained by both compiler and linker.

Behaviors on other COFF/Mach-O are not affected.

Differential Revision: https://reviews.llvm.org/D97649
2021-03-01 13:43:23 -08:00
Nicolas Guillemot 6fb6bdff37 Fix the value_type of defusechain_iterator to match its operator*()
defusechain_iterator has an operator*() and operator->() that return
references to a MachineOperand, but its "reference" and "pointer"
typedefs are set as if the iterator returns a MachineInstr reference.
This causes compilation errors when defusechain_iterator is used in
generic code that uses the "reference" and "pointer" typedefs.
This patch fixes this by updating the typedefs to use MachineOperand
instead of MachineInstr.

Reviewed By: mkitzan

Differential Revision: https://reviews.llvm.org/D97522
2021-03-01 10:41:10 -08:00
Arthur Eubanks 040c1b49d7 Move EntryExitInstrumentation pass location
This seems to be more of a Clang thing rather than a generic LLVM thing,
so this moves it out of LLVM pipelines and as Clang extension hooks into
LLVM pipelines.

Move the post-inline EEInstrumentation out of the backend pipeline and
into a late pass, similar to other sanitizer passes. It doesn't fit
into the codegen pipeline.

Also fix up EntryExitInstrumentation not running at -O0 under the new
PM. PR49143

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D97608
2021-03-01 10:08:10 -08:00
Jay Foad 216dee9170 [AMDGPU] Add IntrWillReturn to recently added intrinsics
This adds IntrWillReturn to the gfx90a mfma intrinsics, to match all the
other mfma intrinsics, and llvm.amdgcn.live.mask, to match
llvm.amdgcn.ps.live.

Differential Revision: https://reviews.llvm.org/D97675
2021-03-01 17:35:26 +00:00
Juneyoung Lee c89d9d8a48 [TTI] Consider select form of and/or i1 as having arithmetic cost
This is a patch that updates the cost of `select i1 a, b, false` to be equivalent to that of `and i1 a, b`
as well as the cost of `select i1 a, true, b` equivalent to `or i1 a, b`.

Until now, these selects were folded into and/or i1 by InstCombine, but the transformation is poison-unsafe.
This is a step towards removing the unsafe transformation. D93065 has relevant transformations linked.
These selects should be translated into the assemblies as and/or i1 do in the same manner. The cost should be equivalent.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D97360
2021-03-02 02:18:19 +09:00
Andy Wingo 2632ba6a35 [WebAssembly] call_indirect issues table number relocs
If the reference-types feature is enabled, call_indirect will explicitly
reference its corresponding function table via TABLE_NUMBER
relocations against a table symbol.

Also, as before, address-taken functions can also cause the function
table to be created, only with reference-types they additionally cause a
symbol table entry to be emitted.

Differential Revision: https://reviews.llvm.org/D90948
2021-03-01 16:49:00 +01:00
Masoud Ataei 5fe0cab79e [PowerPC] Removing sqrtd2 and sqrtf4 from list of vectorizable function with MASSV
Under -O3 and -Ofast, the MASSV conversion prevents the sqrt call to be inlined.
Inline sqrt is faster than MASSV call on leppc.

Differential Revision: https://reviews.llvm.org/D97487
2021-03-01 15:42:19 +00:00
Jay Foad 796a60d2ea [AMDGPU] New intrinsic void llvm.amdgcn.s.sethalt(i32)
The expected use case is for frontends to insert this into
shaders that are to be run under a debugger. The shader can
then be resumed or single stepped from the point of the call
under debugger control.

Differential Revision: https://reviews.llvm.org/D97670
2021-03-01 14:30:23 +00:00
Matt Arsenault 6c260d3bc0 GlobalISel: Move splitToValueTypes to generic code
I copied the nearly identical function from AArch64 into AMDGPU, so
fix this duplication.

Mips and X86 have their own more exotic versions which should be
removed. However replacing those is better left for a separate patch
since it requires other changes to avoid regressions.
2021-03-01 08:58:18 -05:00
Florian Hahn 53dacb7b67
[LV] Generate RT checks up-front and remove them if required.
This patch updates LV to generate the runtime checks just after cost
modeling, to allow a more precise estimate of the actual cost of the
checks. This information will be used in future patches to generate
larger runtime checks in cases where the checks only make up a small
fraction of the expected scalar loop execution time.

The runtime checks are created up-front in a temporary block to allow better
estimating the cost and un-linked from the existing IR. After deciding to
vectorize, the checks are moved backed. If deciding not to vectorize, the
temporary block is completely removed.

This patch is similar in spirit to D71053, but explores a different
direction: instead of delaying the decision on whether to vectorize in
the presence of runtime checks it instead optimistically creates the
runtime checks early and discards them later if decided to not
vectorize. This has the advantage that the cost-modeling decisions
can be kept together and can be done up-front and thus preserving the
general code structure. I think delaying (part) of the decision to
vectorize would also make the VPlan migration a bit harder.

One potential drawback of this patch is that we speculatively
generate IR which we might have to clean up later. However it seems like
the code required to do so is quite manageable.

Reviewed By: lebedev.ri, ebrevnov

Differential Revision: https://reviews.llvm.org/D75980
2021-03-01 10:48:04 +00:00
Fraser Cormack 6718fda6ad [CodeGen] Fix issues with subvector intrinsic index types
This patch addresses issues arising from the fact that the index type
used for subvector insertion/extraction is inconsistent between the
intrinsics and SDNodes. The intrinsic forms require i64 whereas the
SDNodes use the type returned by SelectionDAG::getVectorIdxTy.

Rather than update the intrinsic definitions to use an overloaded index
type, this patch fixes the issue by transforming the index to the
correct type as required. Any loss of index bits going from i64 to a
smaller type is unexpected, and will be caught by an assertion in
SelectionDAG::getVectorIdxConstant.

The patch also updates the documentation for INSERT_SUBVECTOR and adds
an assertion to its creation to bring it in line with EXTRACT_SUBVECTOR.
This necessitated changes to AArch64 which was using i64 for
EXTRACT_SUBVECTOR but i32 for INSERT_SUBVECTOR. Only one test changed
its codegen after updating the backend accordingly.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D97459
2021-03-01 10:28:21 +00:00
Juneyoung Lee 5419b67137 [SimplifyCFG] Update FoldTwoEntryPHINode to handle and/or of select and binop equally
This is a minor change that fixes FoldTwoEntryPHINode to handle
phis with and/ors of select form and binop form equally.
2021-03-01 13:34:51 +09:00
Kazu Hirata b4bed1cb24 [IR] Use range-based for loops (NFC) 2021-02-28 10:59:23 -08:00
Kazu Hirata d639120983 [llvm] Use set_is_subset (NFC) 2021-02-28 10:59:20 -08:00
William S. Moses b077d82b00 [Attributor] Conditinoally delete fns
Allow the attributor to delete functions only if requested

Differential Revision: https://reviews.llvm.org/D97238
2021-02-27 20:37:42 -05:00
Heejin Ahn aa097ef8d4 [WebAssembly] Fix reverse mapping in WasmEHFuncInfo
D97247 added the reverse mapping from unwind destination to their
source, but it had a critical bug; sources can be multiple, because
multiple BBs can have a single BB as their unwind destination.

This changes `WasmEHFuncInfo::getUnwindSrc` to `getUnwindSrcs` and makes
it return a vector rather than a single BB. It does not return the const
reference to the existing vector but creates a new vector because
`WasmEHFuncInfo` stores not `BasicBlock*` or `MachineBasicBlock*` but
`PointerUnion` of them. Also I hoped to unify those methods for
`BasicBlock` and `MachineBasicBlock` into one using templates to reduce
duplication, but failed because various usages require `BasicBlock*` to
be `const` but it's hard to make it `const` for `MachineBasicBlock`
usages.

Fixes https://github.com/emscripten-core/emscripten/issues/13514.
(More precisely, fixes
https://github.com/emscripten-core/emscripten/issues/13514#issuecomment-784708744)

Reviewed By: dschuff, tlively

Differential Revision: https://reviews.llvm.org/D97583
2021-02-26 17:12:10 -08:00
Fangrui Song 47c5576d7d ELF: Create unique SHF_GNU_RETAIN sections for llvm.used global objects
If a global object is listed in `@llvm.used`, place it in a unique section with
the `SHF_GNU_RETAIN` flag. The section is a GC root under `ld --gc-sections`
with LLD>=13 or GNU ld>=2.36.

For front ends which do not expect to see multiple sections of the same name,
consider emitting `@llvm.compiler.used` instead of `@llvm.used`.

SHF_GNU_RETAIN is restricted to ELFOSABI_GNU and ELFOSABI_FREEBSD in
binutils. We don't do the restriction - see the rationale in D95749.

The integrated assembler has supported SHF_GNU_RETAIN since D95730.
GNU as>=2.36 supports section flag 'R'.
We don't need to worry about GNU ld support because older GNU ld just ignores
the unknown SHF_GNU_RETAIN.

With this change, `__attribute__((retain))` functions/variables emitted
by clang will get the SHF_GNU_RETAIN flag.

Differential Revision: https://reviews.llvm.org/D97448
2021-02-26 16:38:44 -08:00
Philip Reames f2cfef3596 Be more mathematicly precise about definition of recurrence [NFC]
This clarifies the interface of the matchSimpleRecurrence helper introduced in 8020be0b8 for non-commutative operators.  After ebd3aeba, I realized the original way I framed the routine was inconsistent.  For shifts, we only matched the the LHS form, but for sub we matched both and the caller wanted that information.  So, instead, we now consistently match both forms for non-commutative operators and the caller becomes responsible for filtering if needed.  I tried to put a clear warning in the header because I suspect the RHS form of e.g. a sub recurrence is non-obvious for most folks.  (It was for me.)
2021-02-26 11:22:01 -08:00
Philip Reames ebd3aeba27 Use helper introduced in 8020be0b8 to simplify ValueTracking [NFC]
Direct rewrite of the code the helper was extracted from.
2021-02-26 10:47:26 -08:00
Philip Reames 8020be0b8b Add a helper for matching simple recurrence cycles
This helper came up in another review, and I've got about 4 different patches with copies of this copied into it.  Time to precommit the routine.  :)
2021-02-26 10:21:23 -08:00
Mircea Trofin a2bfc43ae1 [NFC] Const-ed 2 APIs in VirtRegMap 2021-02-26 09:32:42 -08:00
Mircea Trofin a00f7dc2d5 [NFC] MCRegister fixes in RegisterClassInfo, and const-ed APIs 2021-02-26 08:53:57 -08:00
Vladislav Vinogradov 9909237d99 [ADT][NFC] Add extra typedefs to `ArrayRef` and `MutableArrayRef`
* `value_type`
* `pointer`
* `const_pointer`
* `reference`
* `const_reference`
* `const_reverse_iterator`
* `size_type`
* `difference_type`

It makes `ArrayRef` and `MutableArrayRef` types fully compliant with STL Container concept.

Reviewed By: lattner, courbet

Differential Revision: https://reviews.llvm.org/D95611
2021-02-26 18:37:08 +03:00
Evgeniy Brevnov 13a5cac2ba Revert "[NARY-REASSOCIATE] Support reassociation of min/max"
This reverts commit 83d134c3c4.
2021-02-26 19:47:54 +07:00
Stefan Gränitz 406ef36b03 [Orc] Use extensible RTTI for the orc::ObjectLayer class hierarchy
So far we had no way to distinguish between JITLink and RuntimeDyld in lli. Instead, we used implicit knowledge that RuntimeDyld would be used for linking ELF. In order to get D97337 to work with lli though, we have to move on and allow JITLink for ELF. This patch uses extensible RTTI to allow external clients to add their own layers without touching the LLVM sources.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D97338
2021-02-26 13:13:05 +01:00
Chen Zheng d39bc36b1b [debug-info] refactor emitDwarfUnitLength
remove `Hi` `Lo` argument from `emitDwarfUnitLength`, so we
can make caller of emitDwarfUnitLength easier.

Reviewed By: MaskRay, dblaikie, ikudrin

Differential Revision: https://reviews.llvm.org/D96409
2021-02-25 21:00:25 -05:00
James Y Knight 24539f1ef2 Add Alignment argument to IRBuilder CreateAtomicRMW and CreateAtomicCmpXchg.
And then push those change throughout LLVM.

Keep the old signature in Clang's CGBuilder for now -- that will be
updated in a follow-on patch (D97224).

The MLIR LLVM-IR dialect is not updated to support the new alignment
attribute, but preserves its existing behavior.

Differential Revision: https://reviews.llvm.org/D97223
2021-02-25 18:29:42 -05:00
Francis Visoiu Mistrih fee9abe69c [Remarks] Provide more information about auto-init calls
This now analyzes calls to both intrinsics and functions.

For intrinsics, grab the ones we know and care about (mem* family) and
analyze the arguments.

For calls, use TLI to get more information about the libcalls, then
analyze the arguments if known.

```
auto-init.c:4:7: remark: Call to memset inserted by -ftrivial-auto-var-init. Memory operation size: 4096 bytes. [-Rpass-missed=annotation-remarks]
  int var[1024];
      ^
```

Differential Revision: https://reviews.llvm.org/D97489
2021-02-25 15:14:09 -08:00
Francis Visoiu Mistrih 4753a69a31 [Remarks] Provide more information about auto-init stores
This adds support for analyzing the instruction with the !annotation
"auto-init" in order to generate a more user-friendly remark.

For now, support the store size, and whether it's atomic/volatile.

Example:

```
auto-init.c:4:7: remark: Store inserted by -ftrivial-auto-var-init.Store size: 4 bytes. [-Rpass-missed=annotation-remarks]
  int var;
      ^
```

Differential Revision: https://reviews.llvm.org/D97412
2021-02-25 15:14:09 -08:00
Adrian Prantl 00b3f2f310 Add more historic DWARF vendor extensions
The maintainer of libdwarf kindly provided this patch with a bunch of
historic DWARF extensions that are missing from Dwarf.def. This list
is helpful to avoid potential conflicts in the user-defined vendor
extension space in the future.

Patch by David Anderson!

[Relanded with an updated test.]

Differential Revision: https://reviews.llvm.org/D97242
2021-02-25 15:09:42 -08:00
Richard Smith 95d0d8e9e9 Fix constructor declarations that are invalid in C++20 onwards.
Under C++ CWG DR 2237, the constructor for a class template C must be
written as 'C(...)' not as 'C<T>(...)'. This fixes a build failure with
GCC in C++20 mode.

In passing, remove some other redundant '<T>' qualification from the
affected classes.
2021-02-25 14:25:01 -08:00
Fangrui Song 4d63892acb [SanitizerCoverage] Drop !associated on metadata sections
In SanitizerCoverage, the metadata sections (`__sancov_guards`,
`__sancov_cntrs`, `__sancov_bools`) are referenced by functions.  After
inlining, such a `__sancov_*` section can be referenced by more than one
functions, but its sh_link still refers to the original function's section.
(Note: a SHF_LINK_ORDER section referenced by a section other than its linked-to
section violates the invariant.)

If the original function's section is discarded (e.g. LTO internalization +
`ld.lld --gc-sections`), ld.lld may report a `sh_link points to discarded section` error.

This above reasoning means that `!associated` is not appropriate to be called by
an inlinable function. Non-interposable functions are inline candidates, so we
have to drop `!associated`. A `__sancov_pcs` is not referenced by other sections
but is expected to parallel a metadata section, so we have to make sure the two
sections are retained or discarded at the same time. A section group does the
trick.  (Note: we have a module ctor, so `getUniqueModuleId` guarantees to
return a non-empty string, and `GetOrCreateFunctionComdat` guarantees to return
non-null.)

For interposable functions, we could keep using `!associated`, but
LTO can change the linkage to `internal` and allow such functions to be inlinable,
so we have to drop `!associated`, too. To not interfere with section
group resolution, we need to use the `noduplicates` variant (section group flag 0).
(This allows us to get rid of the ModuleID parameter.)
In -fno-pie and -fpie code (mostly dso_local), instrumented interposable
functions have WeakAny/LinkOnceAny linkages, which are rare. So the
section group header overload should be low.

This patch does not change the object file output for COFF (where `!associated` is ignored).

Reviewed By: morehouse, rnk, vitalybuka

Differential Revision: https://reviews.llvm.org/D97430
2021-02-25 11:59:23 -08:00
Stanislav Mekhanoshin d9c99043bd Option to ignore llvm[.compiler].used uses in hasAddressTaken()
Differential Revision: https://reviews.llvm.org/D96087
2021-02-25 10:06:24 -08:00
Stanislav Mekhanoshin 29e2d9461a Option to ignore assume like intrinsic uses in hasAddressTaken()
Differential Revision: https://reviews.llvm.org/D96081
2021-02-25 09:48:29 -08:00
Jon Roelofs 7f6e331645 Support `#pragma clang section` directives on MachO targets
rdar://59560986

Differential Revision: https://reviews.llvm.org/D97233
2021-02-25 09:30:10 -08:00
Rong Xu 6103b6ad69 [SampleFDO][NFC] Refactor: make SampleProfileLoaderBaseImpl a template class
This patch makes SampleProfileLoaderBaseImpl a template class so it
can be used in CodeGen transformation.

Noticeable changes:
 * use one template parameter and use IRTraits to get other used
   types an type specific functions.
 * remove the temporary "inline" keywords in previous refactor
   patch.
 * change the template function findEquivalencesFor to a regular
   function. This function has a single caller with type of
   PostDominatorTree. It's simpler to use the type directly
   because MachinePostDominatorTree is not a derived type of
   template DominatorTreeBase.

Differential Revision: https://reviews.llvm.org/D96981
2021-02-25 08:26:17 -08:00
Fraser Cormack b368fc735d [CodeGen] Format code comment to 80 columns. NFC. 2021-02-25 15:55:21 +00:00
Jan Svoboda 43cac1d27d [clang][cli] NFC: Remove ArgList infrastructure for recording queries
This patch removes the infrastructure for recording queries in `ArgList`, partially reverting D94472.

The infrastructure was used during command line round-trip to determine which arguments should a certain subset of `CompilerInvocation` generate.

Since D96280, the command line arguments are being generated all at once, making this code no longer necessary.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D96325
2021-02-25 13:53:24 +01:00
Evgeniy Brevnov 83d134c3c4 [NARY-REASSOCIATE] Support reassociation of min/max
Support reassociation for min/max. With that we should be able to transform min(min(a, b), c) -> min(min(a, c), b) if min(a, c) is already available.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D88287
2021-02-25 18:22:39 +07:00