Commit Graph

44335 Commits

Author SHA1 Message Date
Petar Avramovic d44f61f81c Reland [GlobalISel] Combine zext(trunc x) to x
Recommit 4112299ee7. Depends on
4c8fb7ddd6 which was reverted.

Combine zext(trunc x) to x when truncated bits are known to be zero.

Differential Revision: https://reviews.llvm.org/D96031
2021-03-05 11:05:37 +01:00
Luo, Yuanke 8198d83965 [X86] Pass to transform amx intrinsics to scalar operation.
This pass runs in any situations but we skip it when it is not O0 and the
function doesn't have optnone attribute. With -O0, the def of shape to amx
intrinsics is near the amx intrinsics code. We are not able to find a
point which post-dominate all the shape and dominate all amx intrinsics.
To decouple the dependency of the shape, we transform amx intrinsics
to scalar operation, so that compiling doesn't fail. In long term, we
 should improve fast register allocation to allocate amx register.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D93594
2021-03-05 16:02:02 +08:00
Fangrui Song 063b19dea6 [DebugInfo] Delete unused DIVariable::getSource 2021-03-04 21:53:13 -08:00
Fangrui Song 9e28b89827 [DebugInfo] Delete deleted getLine/getColumn
r250405 deleted the functions from DILexicalBlockBase.
2021-03-04 21:49:21 -08:00
Michael Kruse b119120673 [clang][OpenMP] Use OpenMPIRBuilder for workshare loops.
Initial support for using the OpenMPIRBuilder by clang to generate loops using the OpenMPIRBuilder. This initial support is intentionally limited to:
 * Only the worksharing-loop directive.
 * Recognizes only the nowait clause.
 * No loop nests with more than one loop.
 * Untested with templates, exceptions.
 * Semantic checking left to the existing infrastructure.

This patch introduces a new AST node, OMPCanonicalLoop, which becomes parent of any loop that has to adheres to the restrictions as specified by the OpenMP standard. These restrictions allow OMPCanonicalLoop to provide the following additional information that depends on base language semantics:
 * The distance function: How many loop iterations there will be before entering the loop nest.
 * The loop variable function: Conversion from a logical iteration number to the loop variable.

These allow the OpenMPIRBuilder to act solely using logical iteration numbers without needing to be concerned with iterator semantics between calling the distance function and determining what the value of the loop variable ought to be. Any OpenMP logical should be done by the OpenMPIRBuilder such that it can be reused MLIR OpenMP dialect and thus by flang.

The distance and loop variable function are implemented using lambdas (or more exactly: CapturedStmt because lambda implementation is more interviewed with the parser). It is up to the OpenMPIRBuilder how they are called which depends on what is done with the loop. By default, these are emitted as outlined functions but we might think about emitting them inline as the OpenMPRuntime does.

For compatibility with the current OpenMP implementation, even though not necessary for the OpenMPIRBuilder, OMPCanonicalLoop can still be nested within OMPLoopDirectives' CapturedStmt. Although OMPCanonicalLoop's are not currently generated when the OpenMPIRBuilder is not enabled, these can just be skipped when not using the OpenMPIRBuilder in case we don't want to make the AST dependent on the EnableOMPBuilder setting.

Loop nests with more than one loop require support by the OpenMPIRBuilder (D93268). A simple implementation of non-rectangular loop nests would add another lambda function that returns whether a loop iteration of the rectangular overapproximation is also within its non-rectangular subset.

Reviewed By: jdenny

Differential Revision: https://reviews.llvm.org/D94973
2021-03-04 22:52:59 -06:00
Wei Mi 2357d29335 [SampleFDO] Another fix to prevent repeated indirect call promotion in
sample loader pass.

In https://reviews.llvm.org/rG5fb65c02ca5e91e7e1a00e0efdb8edc899f3e4b9,
to prevent repeated indirect call promotion for the same indirect call
and the same target, we used zero-count value profile to indicate an
indirect call has been promoted for a certain target. We removed
PromotedInsns cache in the same patch. However, there was a problem in
that patch described below, and that problem led me to add PromotedInsns
back as a mitigation in
https://reviews.llvm.org/rG4ffad1fb489f691825d6c7d78e1626de142f26cf.

When we get value profile from metadata by calling getValueProfDataFromInst,
we need to specify the maximum possible number of values we expect to read.
We uses MaxNumPromotions in the last patch so the maximum number of value
information extracted from metadata is MaxNumPromotions. If we have many
values including zero-count values when we write the metadata, some of them
will be dropped when we read them because we only read MaxNumPromotions
values. It will allow repeated indirect call promotion again. We need to
make sure if there are values indicating promoted targets, those values need
to be saved in metadata with higher priority than other values.

The patch fixed that problem. We change to use -1 to represent the count
of a promoted target instead of 0 so it is easier to sort the values.
When we prepare to update the metadata in updateIDTMetaData, we will sort
the values in the descending count order and extract only MaxNumPromotions
values to write into metadata. Since -1 is the max uint64_t number, if we
have equal to or less than MaxNumPromotions of -1 count values, they will
all be kept in metadata. If we have more than MaxNumPromotions of -1 count
values, we will only save MaxNumPromotions such values maximally. In such
case, we have logic in place in doesHistoryAllowICP to guarantee no more
promotion in sample loader pass will happen for the indirect call, because
it has been promoted enough.

With this change, now we can remove PromotedInsns without problem.

Differential Revision: https://reviews.llvm.org/D97350
2021-03-04 18:44:12 -08:00
Chen Zheng 87bbf3d1f8 [XCOFF][DebugInfo] support DWARF for XCOFF for assembly output.
Reviewed By: jasonliu

Differential Revision: https://reviews.llvm.org/D95518
2021-03-04 21:07:52 -05:00
David Blaikie a2a55def35 Move llvm/Analysis/ObjCARCUtil.h to IR to fix layering.
This is included from IR files, and IR doesn't/can't depend on Analysis
(because Analysis depends on IR).

Also fix the implementation - don't use non-member static in headers, as
it leads to ODR violations, inaccurate "unused function" warnings, etc.
And fix the header protection macro name (we don't generally include
"LIB" in the names, so far as I can tell).
2021-03-04 16:14:53 -08:00
Francis Visoiu Mistrih 365b78396a [Remarks] Emit variable info in auto-init remarks
This enhances the auto-init remark with information about the variable
that is auto-initialized.

This is based of debug info if available, or alloca names (mostly for
development purposes).

```
auto-init.c:4:7: remark: Call to memset inserted by -ftrivial-auto-var-init. Memory operation size: 4096 bytes.Variables: var (4096 bytes). [-Rpass-missed=annotation-remarks]
  int var[1024];
      ^
```

This allows to see things like partial initialization of a variable that
the optimizer won't be able to completely remove.

Differential Revision: https://reviews.llvm.org/D97734
2021-03-04 12:51:22 -08:00
Nicolas Guillemot 6b8cf7356c Revert "[Support] Add raw_ostream_iterator: ostream_iterator for raw_ostream"
This reverts commit 7479a2e00b.

This commit causes compile errors on clang-x64-windows-msvc, so I'm
reverting the patch for now.

For reference, the error in question is:

```
error C2280: 'llvm::raw_ostream_iterator<char,char>
&llvm::raw_ostream_iterator<char,char>::operator =(const
llvm::raw_ostream_iterator<char,char> &)': attempting to reference a deleted
function

note: compiler has generated 'llvm::raw_ostream_iterator<char,char>::operator ='
here

note: 'llvm::raw_ostream_iterator<char,char>
&llvm::raw_ostream_iterator<char,char>::operator =(const
llvm::raw_ostream_iterator<char,char> &)': function was implicitly deleted
because 'llvm::raw_ostream_iterator<char,char>' has a data member
'llvm::raw_ostream_iterator<char,char>::OutStream' of reference type
```
2021-03-04 12:46:58 -08:00
Akira Hatanaka 1900503595 [ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of
explicitly emitting retainRV or claimRV calls in the IR

This reapplies ed4718eccb, which was reverted
because it was causing a miscompile. The bug that was causing the miscompile
has been fixed in 75805dce5f.

Original commit message:

Background:

This fixes a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.attachedcall" to calls,
  which indicates the call is implicitly followed by a marker
  instruction and an implicit retainRV/claimRV call that consumes the
  call result. In addition, it emits a call to
  @llvm.objc.clang.arc.noop.use, which consumes the call result, to
  prevent the middle-end passes from changing the return type of the
  called function. This is currently done only when the target is arm64
  and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  claimRV is attached to the call since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since the ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if retainRV is attached to the call and
  does nothing if claimRV is attached to it.

- SCCP refrains from replacing the return value of a call with a
  constant value if the call has the operand bundle. This ensures the
  call always has at least one user (the call to
  @llvm.objc.clang.arc.noop.use).

- This patch also fixes a bug in replaceUsesOfNonProtoConstant where
  multiple operand bundles of the same kind were being added to a call.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-03-04 11:22:30 -08:00
Nicolas Guillemot 7479a2e00b [Support] Add raw_ostream_iterator: ostream_iterator for raw_ostream
Adds a class `raw_ostream_iterator` that behaves like
std::ostream_iterator, but can be used with raw_ostream.
This is useful for using raw_ostream with std algorithms.

For example, it can be used to output std containers as follows:

```
std::vector<int> V = { 1, 2, 3 };
std::copy(V.begin(), V.end(), raw_ostream_iterator<int>(outs(), ", "));
// Output: "1, 2, 3, "
```

The API tries to follow std::ostream_iterator as closely as is
practically possible.

Reviewed By: dblaikie, mkitzan

Differential Revision: https://reviews.llvm.org/D78795
2021-03-04 11:12:48 -08:00
Adrian Prantl d268febc56 Improve the debug info for coro-split .resume functions
This patch updates the scope line to point to the suspend point. This
makes the first address in the function point to the first source line
in the resume function rather than the function declaration. Without
this the line table "jumps" from the beginning of the function to the
suspend point at the beginning.

rdar://73386346

Differential Revision: https://reviews.llvm.org/D97345
2021-03-04 11:05:35 -08:00
Albion Fung 36192790d8 [PowerPC][PC Rel] Implement option to omit Power10 instructions from stubs
Implemented the option to omit Power10 instructions from save stubs via the
option --no-power10-stubs or --power10-stubs=no on lld. --power10-stubs= will
override the other option. --power10-stubs=auto also exists to use the default
behaviour (ie allow Power10 instructions in stubs).

Differential Revision: https://reviews.llvm.org/D94627
2021-03-04 13:27:46 -05:00
Nico Weber 76148caa50 Revert "[llvm-exegesis] Disable the LBR check on AMD"
This reverts commit 293e8fa13d.
Breaks build on non-intel hosts, see e.g.
http://45.33.8.238/macm1/4600/step_3.txt
2021-03-04 11:48:33 -05:00
Xiangling Liao e9f9ec837d [CMake][AIX] Adjust plugin library extension used on AIX
As stated in the CMake manual, we are supposed to use MODULE rules to generate
plugin libraries:

"MODULE libraries are plugins that are not linked into other targets but may be
loaded dynamically at runtime using dlopen-like functionality"

Besides, LLVM's plugin infrastructure fits with the AIX treatment of .so
shared objects more than it fits with the AIX treatment of .a library archives
(which may contain shared objects).

Differential revision: https://reviews.llvm.org/D96282
2021-03-04 11:23:06 -05:00
Vy Nguyen 293e8fa13d [llvm-exegesis] Disable the LBR check on AMD
https://bugs.llvm.org/show_bug.cgi?id=48918

The bug reported a hang (or very very slow runtime) on a Zen2. Unfortunately, we don't have the hardware right now to debug it and I was not able to reproduce the bug on a HSW.
Theory we've got is that the lbr-checking code could be confused on AMD.

Differential Revision: https://reviews.llvm.org/D97504
2021-03-04 11:16:38 -05:00
Sanjay Patel 36a489d194 [Analysis][LoopVectorize] rename "Unsafe" variables/methods; NFC
Similar to b3a33553ae, but this shows a TODO and a potential
miscompile is already present.

We are tracking an FP instruction that does *not* have FMF (reassoc)
properties, so calling that "Unsafe" seems opposite of the common
reading.

I also removed one getter method by rolling the null check into
the access. Further simplification may be possible.

The motivation is to clean up the interactions between FMF and
function-level attributes in these classes and their callers.

The new test shows that there is an existing bug somewhere in
the callers. We assumed that the original code was fully 'fast'
and so we produced IR with 'fast' even though it was just 'reassoc'.
2021-03-04 10:40:26 -05:00
Nico Weber 59beb1ef6d Revert "[GlobalISel] Combine zext(trunc x) to x"
This reverts commit 4112299ee7.
Seems to depend on 4c8fb7ddd6 which
is being reverted.
2021-03-04 10:13:40 -05:00
Petar Avramovic 4112299ee7 [GlobalISel] Combine zext(trunc x) to x
Combine zext(trunc x) to x when truncated bits are known to be zero.

Differential Revision: https://reviews.llvm.org/D96031
2021-03-04 15:05:23 +01:00
Sanjay Patel b3a33553ae [Analysis][LoopVectorize] rename "Unsafe" variables/methods; NFC
We are tracking an FP instruction that does *not* have FMF (reassoc)
properties, so calling that "Unsafe" seems opposite of the common
reading.

I also removed one getter method by rolling the null check into
the access. Further simplification seems possible.

The motivation is to clean up the interactions between FMF and
function-level attributes in these classes and their callers.
2021-03-04 08:53:04 -05:00
Stephen Tozer d2000b45d0 Revert "[DebugInfo] Add new instruction and DIExpression operator for variadic debug values"
This reverts commit d07f106f4a.
2021-03-04 11:59:21 +00:00
gbtozers d07f106f4a [DebugInfo] Add new instruction and DIExpression operator for variadic debug values
This patch adds a new instruction that can represent variadic debug values,
DBG_VALUE_VAR. This patch alone covers the addition of the instruction and a set
of basic code changes in MachineInstr and a few adjacent areas, but does not
correctly handle variadic debug values outside of these areas, nor does it
generate them at any point.

The new instruction is similar to the existing DBG_VALUE instruction, with the
following differences: the operands are in a different order, any number of
values may be used in the instruction following the Variable and Expression
operands (these are referred to in code as “debug operands”) and are indexed
from 0 so that getDebugOperand(X) == getOperand(X+2), and the Expression in a
DBG_VALUE_VAR must use the DW_OP_LLVM_arg operator to pass arguments into the
expression.

The new DW_OP_LLVM_arg operator is only valid in expressions appearing in a
DBG_VALUE_VAR; it takes a single argument and pushes the debug operand at the
index given by the argument onto the Expression stack. For example the
sub-expression `DW_OP_LLVM_arg, 0` has the meaning “Push the debug operand at
index 0 onto the expression stack.”

Differential Revision: https://reviews.llvm.org/D82363
2021-03-04 11:45:35 +00:00
Andrew Savonichev d791695cb5 [MCA] Add support for in-order CPUs
This patch adds a pipeline to support in-order CPUs such as ARM
Cortex-A55.

In-order pipeline implements a simplified version of Dispatch,
Scheduler and Execute stages as a single stage. Entry and Retire
stages are common for both in-order and out-of-order pipelines.

Differential Revision: https://reviews.llvm.org/D94928
2021-03-04 14:08:19 +03:00
Fraser Cormack c907681b07 [NFC] Fix typos in CallingConvLower.h 2021-03-04 10:29:14 +00:00
Hongtao Yu c75da238b4 [CSSPGO] Deduplicating dangling pseudo probes.
Same dangling probes are redundant since they all have the same semantic that is to rely on the counts inference tool to get reasonable count for the same original block. Therefore, there's no need to keep multiple copies of them. I've seen jump threading created tons of redundant dangling probes that slowed down the compiler dramatically. Other optimization passes can also result in redundant probes though without an observed impact so far.

This change removes block-wise redundant dangling probes specifically introduced by jump threading. To support removing redundant dangling probes caused by all other passes, a final function-wise deduplication is also added.

An 18% size win of the .pseudo_probe section was seen for SPEC2017. No performance difference was observed.

Differential Revision: https://reviews.llvm.org/D97482
2021-03-03 22:44:42 -08:00
Hongtao Yu 8985515822 [CSSPGO] Unblocking optimizations by dangling pseudo probes.
This change fixes a couple places where the pseudo probe intrinsic blocks optimizations because they are not naturally removable. To unblock those optimizations, the blocking pseudo probes are moved out of the original blocks and tagged dangling, instead of allowing pseudo probes to be literally removed. The reason is that when the original block is removed, we won't be able to sample it. Instead of assigning it a zero weight, moving all its pseudo probes into another block and marking them dangling should allow the counts inference a chance to assign them a more reasonable weight. We have not seen counts quality degradation from our experiments.

The optimizations being unblocked are:

	1. Removing conditional probes for if-converted branches. Conditional probes are tagged dangling when their homing branch arms are folded so that they will not be over-counted.
	2. Unblocking jump threading from removing empty blocks. Pseudo probe prevents jump threading from removing logically empty blocks that only has one unconditional jump instructions.
	3. Unblocking SimplifyCFG and MIR tail duplicate to thread empty blocks and blocks with redundant branch checks.

Since dangling probes are logically deleted, they should not consume any samples in LTO postLink. This can be achieved by setting their distribution factors to zero when dangled.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D97481
2021-03-03 22:44:42 -08:00
Hongtao Yu ad2a59f584 [CSSPGO] Introducing dangling pseudo probes.
Dangling probes are the probes associated to an empty block. This usually happens when all real instructions are optimized away from the block. There is a problem with dangling probes during the offline counts processing. The way the sample profiler works is that samples collected on the first physical instruction following a probe will be counted towards the probe. This logically equals to treating the instruction next to a probe as if it is from the same block of the probe. In the dangling probe case, the real instruction following a dangling probe actually starts a new block, and samples collected on the new block may cause issues when counted towards the empty block.

To mitigate this issue, we first try to move around a dangling probe inside its owning block. If there are still native instructions preceding the probe in the same block, we can then use them as a place holder to collect samples for the probe. A pass is added to walk each block backwards looking for probes not followed by any real instruction and moving them before the first real instruction. This is done right before the object emission.

If we are unlucky to find such in-block preceding instructions for a probe, the solution we are taking is to tag such probe as dangling so that the samples reported for them will not be trusted by the compiler. We leave it up to the counts inference algorithm to get such probes a reasonable count. The number `UINT64_MAX` is used to mark sample count as collected for a dangling probe.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D95962
2021-03-03 22:44:41 -08:00
Johannes Doerfert 5b70c12f3e [Attributor] Make DepClass a required argument
We often used a sub-optimal dependence class in the past because we
didn't see the argument. Let's make it explicit so we remember to think
about it.
2021-03-04 00:35:52 -06:00
Johannes Doerfert e592dad82e [Attributor] Fold "TrackDependence" into the DepClassTy enum
We don't need a bool and an enum to express the three options we
currently have. This makes the interface nicer and much easier to
use optional dependencies. Also avoids mistakes where the bool is
false and enum ignored.
2021-03-04 00:35:52 -06:00
Johannes Doerfert f3f88287c5 [Attributor] Use known alignment as lower bound to avoid work
If we know already more than available from a use, we don't need to
invest time on it.
2021-03-04 00:35:52 -06:00
Philip Reames 99f5417346 Sink routine for replacing a operand bundle to CallBase [NFC]
We had equivalent code for both CallInst and InvokeInst, but never cared about the result type.
2021-03-03 12:07:55 -08:00
Fangrui Song a84f4fc0df [InstrProfiling] Place __llvm_prf_vnodes and __llvm_prf_names in llvm.used on ELF
`__llvm_prf_vnodes` and `__llvm_prf_names` are used by runtime but not
referenced via relocation in the translation unit.

With `-z start-stop-gc` (LLD 13 (D96914); GNU ld 2.37 https://sourceware.org/bugzilla/show_bug.cgi?id=27451),
the linker does not let `__start_/__stop_` references retain their sections.

Place `__llvm_prf_vnodes` and `__llvm_prf_names` in `llvm.used` to make
them retained by the linker.

This patch changes most existing `UsedVars` cases to `CompilerUsedVars`
to reflect the ideal state - if the binary format properly supports
section based GC (dead stripping), `llvm.compiler.used` should be sufficient.

`__llvm_prf_vnodes` and `__llvm_prf_names` are switched to `UsedVars`
since we want them to be unconditionally retained by both compiler and linker.

Behaviors on COFF/Mach-O are not affected.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D97649
2021-03-03 11:32:24 -08:00
Philip Reames e6e5ef40cb [basicaa] Fix a latent bug in isGEPBaseAtNegativeOffset
This was pointed out in review of D97520 by Nikita, but existed in the original code as well.

The basic issue is that a decomposed GEP expression describes (potentially) more than one getelementptr.  The "inbounds" derived UB which justifies this aliasing rule requires that the entire offset be composed of "inbounds" geps.  Otherwise, as can be seen in the recently added and changes in this patch test, we can end up with a large commulative offset with only a small sub-offset actually being "inbounds".  If that small sub-offset lies within the object, the result was unsound.

We could potentially be fancier here, but for the moment, simply be conservative when any of the GEPs parsed aren't inbounds.
2021-03-03 08:43:32 -08:00
Philip Reames dd9922c487 [basicaa] Minor indentation fix 2021-03-03 08:33:19 -08:00
Arnold Schwaighofer a42bea211a [coro async] Allow a coro.suspend.async to specify which argument is the context argument
Before we used the same argument as the entry point. The resume partial
function might want to use a different ABI for its context argument

Differential Revision: https://reviews.llvm.org/D97333
2021-03-03 08:27:37 -08:00
Nico Weber 64f5d7e972 Revert "[InstrProfiling] Place __llvm_prf_vnodes and __llvm_prf_names in llvm.used on ELF"
This reverts commit 04c3040f41.
Breaks instrprof-value-merge.c in bootstrap builds.
2021-03-03 10:21:17 -05:00
Hans Wennborg 0a5dd06718 Revert "[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR"
This caused miscompiles of Chromium tests for iOS due clobbering of live
registers. See discussion on the code review for details.

> Background:
>
> This fixes a longstanding problem where llvm breaks ARC's autorelease
> optimization (see the link below) by separating calls from the marker
> instructions or retainRV/claimRV calls. The backend changes are in
> https://reviews.llvm.org/D92569.
>
> https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
>
> What this patch does to fix the problem:
>
> - The front-end adds operand bundle "clang.arc.attachedcall" to calls,
>   which indicates the call is implicitly followed by a marker
>   instruction and an implicit retainRV/claimRV call that consumes the
>   call result. In addition, it emits a call to
>   @llvm.objc.clang.arc.noop.use, which consumes the call result, to
>   prevent the middle-end passes from changing the return type of the
>   called function. This is currently done only when the target is arm64
>   and the optimization level is higher than -O0.
>
> - ARC optimizer temporarily emits retainRV/claimRV calls after the calls
>   with the operand bundle in the IR and removes the inserted calls after
>   processing the function.
>
> - ARC contract pass emits retainRV/claimRV calls after the call with the
>   operand bundle. It doesn't remove the operand bundle on the call since
>   the backend needs it to emit the marker instruction. The retainRV and
>   claimRV calls are emitted late in the pipeline to prevent optimization
>   passes from transforming the IR in a way that makes it harder for the
>   ARC middle-end passes to figure out the def-use relationship between
>   the call and the retainRV/claimRV calls (which is the cause of
>   PR31925).
>
> - The function inliner removes an autoreleaseRV call in the callee if
>   nothing in the callee prevents it from being paired up with the
>   retainRV/claimRV call in the caller. It then inserts a release call if
>   claimRV is attached to the call since autoreleaseRV+claimRV is
>   equivalent to a release. If it cannot find an autoreleaseRV call, it
>   tries to transfer the operand bundle to a function call in the callee.
>   This is important since the ARC optimizer can remove the autoreleaseRV
>   returning the callee result, which makes it impossible to pair it up
>   with the retainRV/claimRV call in the caller. If that fails, it simply
>   emits a retain call in the IR if retainRV is attached to the call and
>   does nothing if claimRV is attached to it.
>
> - SCCP refrains from replacing the return value of a call with a
>   constant value if the call has the operand bundle. This ensures the
>   call always has at least one user (the call to
>   @llvm.objc.clang.arc.noop.use).
>
> - This patch also fixes a bug in replaceUsesOfNonProtoConstant where
>   multiple operand bundles of the same kind were being added to a call.
>
> Future work:
>
> - Use the operand bundle on x86-64.
>
> - Fix the auto upgrader to convert call+retainRV/claimRV pairs into
>   calls with the operand bundles.
>
> rdar://71443534
>
> Differential Revision: https://reviews.llvm.org/D92808

This reverts commit ed4718eccb.
2021-03-03 15:51:40 +01:00
Hans Wennborg ddf43e5130 revert llvm/include/llvm/Analysis/ObjCARCUtil.h part of 1cc558bd4f 2021-03-03 15:50:02 +01:00
Matt Arsenault 78dcff4841 GlobalISel: Add default implementation of assignValueToReg
Refactor insertion of the asserting ops. This enables using them for
AMDGPU.

This code should essentially be the same for every target. Mips, X86
and ARM all have different code there now, but this seems to be an
accident. The assignment functions are called with different types
than they would be in the DAG, so this is all likely an assortment of
hacks to get around that.
2021-03-03 09:29:53 -05:00
Piotr Sobczak 4672bac177 [AMDGPU] Introduce Strict WQM mode
* Add amdgcn_strict_wqm intrinsic.
* Add a corresponding STRICT_WQM machine instruction.
* The semantic is similar to amdgcn_strict_wwm with a notable difference that not all threads will be forcibly enabled during the computations of the intrinsic's argument, but only all threads in quads that have at least one thread active.
* The difference between amdgc_wqm and amdgcn_strict_wqm, is that in the strict mode an inactive lane will always be enabled irrespective of control flow decisions.

Reviewed By: critson

Differential Revision: https://reviews.llvm.org/D96258
2021-03-03 14:19:16 +01:00
Piotr Sobczak c3ce7bae80 [AMDGPU] Rename amdgcn_wwm to amdgcn_strict_wwm
* Introduce the new intrinsic amdgcn_strict_wwm
 * Deprecate the old intrinsic amdgcn_wwm

The change is done for consistency as the "strict"
prefix will become an important, distinguishing factor
between amdgcn_wqm and amdgcn_strictwqm in the future.

The "strict" prefix indicates that inactive lanes do not
take part in control flow, specifically an inactive lane
enabled by a strict mode will always be enabled irrespective
of control flow decisions.

The amdgcn_wwm will be removed, but doing so in two steps
gives users time to switch to the new name at their own pace.

Reviewed By: critson

Differential Revision: https://reviews.llvm.org/D96257
2021-03-03 09:33:57 +01:00
Carl Ritson 2ddac69f98 [AMDGPU] Rename llvm.amdgcn.msaa.load to llvm.amdgcn.msaa.load.x
While the underlying instruction is called image_msaa_load,
the resource must be x component only.
Rename the intrinsic for clarity.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D97829
2021-03-03 17:30:39 +09:00
Victor Huang 1756b2adc9 [AIX][TLS] Generate TLS variables in assembly files
This patch allows generating TLS variables in assembly files on AIX.
Initialized and external uninitialized variables are generated with the
.csect pseudo-op and local uninitialized variables are generated with
the .comm/.lcomm pseudo-ops. The patch also adds a check to
explicitly say that TLS is not yet supported on AIX.

Reviewed by: daltenty, jasonliu, lei, nemanjai, sfertile
Originally patched by: bsaleil
Commandeered by: NeHuang

Differential Revision: https://reviews.llvm.org/D96184
2021-03-02 18:22:48 -06:00
Arthur Eubanks 99f1e86cbb [opt] Error if -debug-pass is specified alongside the new PM
Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D97810
2021-03-02 15:59:28 -08:00
Nikita Popov 29034f3876 [AST] Remove unused Loop member (NFC)
To fix some build bots after D89264.
2021-03-02 23:11:51 +01:00
Nikita Popov 3d8f842712 [LICM] Make promotion faster
Even when MemorySSA-based LICM is used, an AST is still populated
for scalar promotion. As the AST has quadratic complexity, a lot
of time is spent in this step despite the existing access count
limit. This patch optimizes the identification of promotable stores.

The idea here is pretty simple: We're only interested in must-alias
mod sets of loop invariant pointers. As such, only populate the AST
with loop-invariant loads and stores (anything else is definitely
not promotable) and then discard any sets which alias with any of
the remaining, definitely non-promotable accesses.

If we promoted something, check whether this has made some other
accesses loop invariant and thus possible promotion candidates.

This is much faster in practice, because we need to perform AA
queries for O(NumPromotable^2 + NumPromotable*NumNonPromotable)
instead of O(NumTotal^2), and NumPromotable tends to be small.
Additionally, promotable accesses have loop invariant pointers,
for which AA is cheaper.

This has a signicant positive compile-time impact. We save ~1.8%
geomean on CTMark at O3, with 6% on lencod in particular and 25%
on individual files.

Conceptually, this change is NFC, but may not be so in practice,
because the AST is only an approximation, and can produce
different results depending on the order in which accesses are
added. However, there is at least no impact on the number of promotions
(licm.NumPromoted) in test-suite O3 configuration with this change.

Differential Revision: https://reviews.llvm.org/D89264
2021-03-02 22:10:48 +01:00
Amara Emerson 8a316045ed [AArch64][GlobalISel] Enable use of the optsize predicate in the selector.
To do this while supporting the existing functionality in SelectionDAG of using
PGO info, we add the ProfileSummaryInfo and LazyBlockFrequencyInfo analysis
dependencies to the instruction selector pass.

Then, use the predicate to generate constant pool loads for f32 materialization,
if we're targeting optsize/minsize.

Differential Revision: https://reviews.llvm.org/D97732
2021-03-02 12:55:51 -08:00
Sanjay Patel 415c67ba4c [SDAG] allow partial undef vector constants with select->logic folds
This is an enhancement suggested in the original review/commit:
D97730 / 7fce3322a2
2021-03-02 14:29:15 -05:00
Vy Nguyen 9a2e2de15f [lld-macho] Change loadReexport to handle the case where a TAPI re-exports to reference documents nested within other TBD.
Currently, it was delibrately impleneted to not handle this case, but as it has turnt out, we need this feature.
The concrete use case is
       `System/Library/Frameworks/Cocoa.framework/Versions/A/Cocoa` reexports
               /System/Library/Frameworks/AppKit.framework/Versions/C/AppKit , which then rexports
                    /System/Library/PrivateFrameworks/UIFoundation.framework/Versions/A/UIFoundation

The current implemention uses a global currentTopLevelTapi, which is not reset until it finishes loading the whole tree.
This is a problem because if the top-level is set to Cocoa, then when we get to UIFoundation, it will try to find UIFoundation in the current top level, which is Cocoa and will not find it.

The right thing should be:
 - When loading a library from a TBD file, re-exports need to be looked up in the auxiliary documents within the same TBD.
 - When loading from an actual dylib, no additional TBD documents need to be examined.
 - In no case does a re-export mentioned in one TBD file need to be looked up in a document in an auxiliary document from a different TBD file

Differential Revision: https://reviews.llvm.org/D97438
2021-03-02 12:14:31 -05:00
Krzysztof Parzyszek d96b5e606a [TableGen] Add IntrNoMerge as intrinsic property
There is a function attribute 'nomerge' in addition to 'noduplicate'
and 'convergent'. Both 'noduplicate' and 'convergent' have corresponding
intrinsic properties. This patch adds an intrinsic property for the
'nomerge' attribute.

Differential Revision: https://reviews.llvm.org/D96364
2021-03-02 09:04:50 -08:00
dfukalov 6e967834b9 [AA] Cache (optionally) estimated PartialAlias offsets.
For the cases of two clobbering loads and one loaded object is fully contained
in the second `BasicAAResult::aliasGEP` returns just `PartialAlias` that
is actually more common case of partial overlap, it doesn't say anything about
actual overlapping sizes.

AA users such as GVN and DSE have no functionality to estimate aliasing of GEPs
with non-constant offsets. The change stores estimated relative offsets so they
can be used further.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D93529
2021-03-02 19:04:15 +03:00
Tim Northover 888c5c24ca AArch64: report fp16 arithmetic is present for apple-a11 CPU.
AArch64.td got it right, but the target-parser dropped it, leading to missing
feature flags in Clang.
2021-03-02 15:07:18 +00:00
Stefan Gränitz ef2389235c [Orc] Add JITLink debug support plugin for ELF x86-64
Add a new ObjectLinkingLayer plugin `DebugObjectManagerPlugin` and infrastructure to handle creation of `DebugObject`s as well as their registration in OrcTargetProcess. The current implementation only covers ELF on x86-64, but the infrastructure is not limited to that.

The journey starts with a new `LinkGraph` / `JITLinkContext` pair being created for a `MaterializationResponsibility` in ORC's `ObjectLinkingLayer`. It sends a `notifyMaterializing()` notification, which is forwarded to all registered plugins. The `DebugObjectManagerPlugin` aims to create a  `DebugObject` form the provided target triple and object buffer. (Future implementations might create `DebugObject`s from a `LinkGraph` in other ways.) On success it will track it as the pending `DebugObject` for the `MaterializationResponsibility`.

This patch only implements the `ELFDebugObject` for `x86-64` targets. It follows the RuntimeDyld approach for debug object setup: it captures a copy of the input object, parses all section headers and prepares to patch their load-address fields with their final addresses in target memory. It instructs the plugin to report the section load-addresses once they are available. The plugin overrides `modifyPassConfig()` and installs a JITLink post-allocation pass to capture them.

Once JITLink emitted the finalized executable, the plugin emits and registers the `DebugObject`. For emission it requests a new `JITLinkMemoryManager::Allocation` with a single read-only segment, copies the object with patched section load-addresses over to working memory and triggers finalization to target memory. For registration, it notifies the `DebugObjectRegistrar` provided in the constructor and stores the previously pending`DebugObject` as registered for the corresponding MaterializationResponsibility.

The `DebugObjectRegistrar` registers the `DebugObject` with the target process. `llvm-jitlink` uses the `TPCDebugObjectRegistrar`, which calls `llvm_orc_registerJITLoaderGDBWrapper()` in the target process via `TargetProcessControl` to emit a `jit_code_entry` compatible with the GDB JIT interface [1]. So far the implementation only supports registration and no removal. It appears to me that it wouldn't raise any new design questions, so I left this as an addition for the near future.

[1] https://sourceware.org/gdb/current/onlinedocs/gdb/JIT-Interface.html

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D97335
2021-03-02 15:07:35 +01:00
Stefan Gränitz 48c2acff0c [JITLink] LinkGraph::getName() can be const 2021-03-02 15:07:34 +01:00
Jan Svoboda 4545813b17 [clang][cli] NFC: Rename marshalling multiclass
The new name drops `String` from `MarshallingInfoStringInt`, which follows the naming convention of other marshalling multiclasses.
2021-03-02 11:53:40 +01:00
Yuanfang Chen 5de2d189e6 [Diagnose] Unify MCContext and LLVMContext diagnosing
The situation with inline asm/MC error reporting is kind of messy at the
moment. The errors from MC layout are not reliably propagated and users
have to specify an inlineasm handler separately to get inlineasm
diagnose. The latter issue is not a correctness issue but could be improved.

* Kill LLVMContext inlineasm diagnose handler and migrate it to use
  DiagnoseInfo/DiagnoseHandler.
* Introduce `DiagnoseInfoSrcMgr` to diagnose SourceMgr backed errors. This
  covers use cases like inlineasm, MC, and any clients using SourceMgr.
* Move AsmPrinter::SrcMgrDiagInfo and its instance to MCContext. The next step
  is to combine MCContext::SrcMgr and MCContext::InlineSrcMgr because in all
  use cases, only one of them is used.
* If LLVMContext is available, let MCContext uses LLVMContext's diagnose
  handler; if LLVMContext is not available, MCContext uses its own default
  diagnose handler which just prints SMDiagnostic.
* Change a few clients(Clang, llc, lldb) to use the new way of reporting.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D97449
2021-03-01 15:58:37 -08:00
Matt Arsenault 0131498402 GlobalISel: Remove dead code
Generic code should probably not introduce G_INSERT/G_EXTRACT. The
mirror unpackRegs should also be removed, but AMDGPU still has a use
remaining which needs to be fixed.
2021-03-01 17:06:43 -05:00
Fangrui Song 04c3040f41 [InstrProfiling] Place __llvm_prf_vnodes and __llvm_prf_names in llvm.used on ELF
`__llvm_prf_vnodes` and `__llvm_prf_names` are used by runtime but not
referenced via relocation in the translation unit.

With `-z start-stop-gc` (D96914 https://sourceware.org/bugzilla/show_bug.cgi?id=27451),
the linker no longer lets `__start_/__stop_` references retain them.

Place `__llvm_prf_vnodes` and `__llvm_prf_names` in `llvm.used` to make
them retained by the linker.

This patch changes most existing `UsedVars` cases to `CompilerUsedVars`
to reflect the ideal state - if the binary format properly supports
section based GC (dead stripping), `llvm.compiler.used` should be sufficient.

`__llvm_prf_vnodes` and `__llvm_prf_names` are switched to `UsedVars`
since we want them to be unconditionally retained by both compiler and linker.

Behaviors on other COFF/Mach-O are not affected.

Differential Revision: https://reviews.llvm.org/D97649
2021-03-01 13:43:23 -08:00
Nicolas Guillemot 6fb6bdff37 Fix the value_type of defusechain_iterator to match its operator*()
defusechain_iterator has an operator*() and operator->() that return
references to a MachineOperand, but its "reference" and "pointer"
typedefs are set as if the iterator returns a MachineInstr reference.
This causes compilation errors when defusechain_iterator is used in
generic code that uses the "reference" and "pointer" typedefs.
This patch fixes this by updating the typedefs to use MachineOperand
instead of MachineInstr.

Reviewed By: mkitzan

Differential Revision: https://reviews.llvm.org/D97522
2021-03-01 10:41:10 -08:00
Arthur Eubanks 040c1b49d7 Move EntryExitInstrumentation pass location
This seems to be more of a Clang thing rather than a generic LLVM thing,
so this moves it out of LLVM pipelines and as Clang extension hooks into
LLVM pipelines.

Move the post-inline EEInstrumentation out of the backend pipeline and
into a late pass, similar to other sanitizer passes. It doesn't fit
into the codegen pipeline.

Also fix up EntryExitInstrumentation not running at -O0 under the new
PM. PR49143

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D97608
2021-03-01 10:08:10 -08:00
Jay Foad 216dee9170 [AMDGPU] Add IntrWillReturn to recently added intrinsics
This adds IntrWillReturn to the gfx90a mfma intrinsics, to match all the
other mfma intrinsics, and llvm.amdgcn.live.mask, to match
llvm.amdgcn.ps.live.

Differential Revision: https://reviews.llvm.org/D97675
2021-03-01 17:35:26 +00:00
Juneyoung Lee c89d9d8a48 [TTI] Consider select form of and/or i1 as having arithmetic cost
This is a patch that updates the cost of `select i1 a, b, false` to be equivalent to that of `and i1 a, b`
as well as the cost of `select i1 a, true, b` equivalent to `or i1 a, b`.

Until now, these selects were folded into and/or i1 by InstCombine, but the transformation is poison-unsafe.
This is a step towards removing the unsafe transformation. D93065 has relevant transformations linked.
These selects should be translated into the assemblies as and/or i1 do in the same manner. The cost should be equivalent.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D97360
2021-03-02 02:18:19 +09:00
Andy Wingo 2632ba6a35 [WebAssembly] call_indirect issues table number relocs
If the reference-types feature is enabled, call_indirect will explicitly
reference its corresponding function table via TABLE_NUMBER
relocations against a table symbol.

Also, as before, address-taken functions can also cause the function
table to be created, only with reference-types they additionally cause a
symbol table entry to be emitted.

Differential Revision: https://reviews.llvm.org/D90948
2021-03-01 16:49:00 +01:00
Masoud Ataei 5fe0cab79e [PowerPC] Removing sqrtd2 and sqrtf4 from list of vectorizable function with MASSV
Under -O3 and -Ofast, the MASSV conversion prevents the sqrt call to be inlined.
Inline sqrt is faster than MASSV call on leppc.

Differential Revision: https://reviews.llvm.org/D97487
2021-03-01 15:42:19 +00:00
Jay Foad 796a60d2ea [AMDGPU] New intrinsic void llvm.amdgcn.s.sethalt(i32)
The expected use case is for frontends to insert this into
shaders that are to be run under a debugger. The shader can
then be resumed or single stepped from the point of the call
under debugger control.

Differential Revision: https://reviews.llvm.org/D97670
2021-03-01 14:30:23 +00:00
Matt Arsenault 6c260d3bc0 GlobalISel: Move splitToValueTypes to generic code
I copied the nearly identical function from AArch64 into AMDGPU, so
fix this duplication.

Mips and X86 have their own more exotic versions which should be
removed. However replacing those is better left for a separate patch
since it requires other changes to avoid regressions.
2021-03-01 08:58:18 -05:00
Florian Hahn 53dacb7b67
[LV] Generate RT checks up-front and remove them if required.
This patch updates LV to generate the runtime checks just after cost
modeling, to allow a more precise estimate of the actual cost of the
checks. This information will be used in future patches to generate
larger runtime checks in cases where the checks only make up a small
fraction of the expected scalar loop execution time.

The runtime checks are created up-front in a temporary block to allow better
estimating the cost and un-linked from the existing IR. After deciding to
vectorize, the checks are moved backed. If deciding not to vectorize, the
temporary block is completely removed.

This patch is similar in spirit to D71053, but explores a different
direction: instead of delaying the decision on whether to vectorize in
the presence of runtime checks it instead optimistically creates the
runtime checks early and discards them later if decided to not
vectorize. This has the advantage that the cost-modeling decisions
can be kept together and can be done up-front and thus preserving the
general code structure. I think delaying (part) of the decision to
vectorize would also make the VPlan migration a bit harder.

One potential drawback of this patch is that we speculatively
generate IR which we might have to clean up later. However it seems like
the code required to do so is quite manageable.

Reviewed By: lebedev.ri, ebrevnov

Differential Revision: https://reviews.llvm.org/D75980
2021-03-01 10:48:04 +00:00
Fraser Cormack 6718fda6ad [CodeGen] Fix issues with subvector intrinsic index types
This patch addresses issues arising from the fact that the index type
used for subvector insertion/extraction is inconsistent between the
intrinsics and SDNodes. The intrinsic forms require i64 whereas the
SDNodes use the type returned by SelectionDAG::getVectorIdxTy.

Rather than update the intrinsic definitions to use an overloaded index
type, this patch fixes the issue by transforming the index to the
correct type as required. Any loss of index bits going from i64 to a
smaller type is unexpected, and will be caught by an assertion in
SelectionDAG::getVectorIdxConstant.

The patch also updates the documentation for INSERT_SUBVECTOR and adds
an assertion to its creation to bring it in line with EXTRACT_SUBVECTOR.
This necessitated changes to AArch64 which was using i64 for
EXTRACT_SUBVECTOR but i32 for INSERT_SUBVECTOR. Only one test changed
its codegen after updating the backend accordingly.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D97459
2021-03-01 10:28:21 +00:00
Juneyoung Lee 5419b67137 [SimplifyCFG] Update FoldTwoEntryPHINode to handle and/or of select and binop equally
This is a minor change that fixes FoldTwoEntryPHINode to handle
phis with and/ors of select form and binop form equally.
2021-03-01 13:34:51 +09:00
Kazu Hirata b4bed1cb24 [IR] Use range-based for loops (NFC) 2021-02-28 10:59:23 -08:00
Kazu Hirata d639120983 [llvm] Use set_is_subset (NFC) 2021-02-28 10:59:20 -08:00
William S. Moses b077d82b00 [Attributor] Conditinoally delete fns
Allow the attributor to delete functions only if requested

Differential Revision: https://reviews.llvm.org/D97238
2021-02-27 20:37:42 -05:00
Heejin Ahn aa097ef8d4 [WebAssembly] Fix reverse mapping in WasmEHFuncInfo
D97247 added the reverse mapping from unwind destination to their
source, but it had a critical bug; sources can be multiple, because
multiple BBs can have a single BB as their unwind destination.

This changes `WasmEHFuncInfo::getUnwindSrc` to `getUnwindSrcs` and makes
it return a vector rather than a single BB. It does not return the const
reference to the existing vector but creates a new vector because
`WasmEHFuncInfo` stores not `BasicBlock*` or `MachineBasicBlock*` but
`PointerUnion` of them. Also I hoped to unify those methods for
`BasicBlock` and `MachineBasicBlock` into one using templates to reduce
duplication, but failed because various usages require `BasicBlock*` to
be `const` but it's hard to make it `const` for `MachineBasicBlock`
usages.

Fixes https://github.com/emscripten-core/emscripten/issues/13514.
(More precisely, fixes
https://github.com/emscripten-core/emscripten/issues/13514#issuecomment-784708744)

Reviewed By: dschuff, tlively

Differential Revision: https://reviews.llvm.org/D97583
2021-02-26 17:12:10 -08:00
Fangrui Song 47c5576d7d ELF: Create unique SHF_GNU_RETAIN sections for llvm.used global objects
If a global object is listed in `@llvm.used`, place it in a unique section with
the `SHF_GNU_RETAIN` flag. The section is a GC root under `ld --gc-sections`
with LLD>=13 or GNU ld>=2.36.

For front ends which do not expect to see multiple sections of the same name,
consider emitting `@llvm.compiler.used` instead of `@llvm.used`.

SHF_GNU_RETAIN is restricted to ELFOSABI_GNU and ELFOSABI_FREEBSD in
binutils. We don't do the restriction - see the rationale in D95749.

The integrated assembler has supported SHF_GNU_RETAIN since D95730.
GNU as>=2.36 supports section flag 'R'.
We don't need to worry about GNU ld support because older GNU ld just ignores
the unknown SHF_GNU_RETAIN.

With this change, `__attribute__((retain))` functions/variables emitted
by clang will get the SHF_GNU_RETAIN flag.

Differential Revision: https://reviews.llvm.org/D97448
2021-02-26 16:38:44 -08:00
Philip Reames f2cfef3596 Be more mathematicly precise about definition of recurrence [NFC]
This clarifies the interface of the matchSimpleRecurrence helper introduced in 8020be0b8 for non-commutative operators.  After ebd3aeba, I realized the original way I framed the routine was inconsistent.  For shifts, we only matched the the LHS form, but for sub we matched both and the caller wanted that information.  So, instead, we now consistently match both forms for non-commutative operators and the caller becomes responsible for filtering if needed.  I tried to put a clear warning in the header because I suspect the RHS form of e.g. a sub recurrence is non-obvious for most folks.  (It was for me.)
2021-02-26 11:22:01 -08:00
Philip Reames ebd3aeba27 Use helper introduced in 8020be0b8 to simplify ValueTracking [NFC]
Direct rewrite of the code the helper was extracted from.
2021-02-26 10:47:26 -08:00
Philip Reames 8020be0b8b Add a helper for matching simple recurrence cycles
This helper came up in another review, and I've got about 4 different patches with copies of this copied into it.  Time to precommit the routine.  :)
2021-02-26 10:21:23 -08:00
Mircea Trofin a2bfc43ae1 [NFC] Const-ed 2 APIs in VirtRegMap 2021-02-26 09:32:42 -08:00
Mircea Trofin a00f7dc2d5 [NFC] MCRegister fixes in RegisterClassInfo, and const-ed APIs 2021-02-26 08:53:57 -08:00
Vladislav Vinogradov 9909237d99 [ADT][NFC] Add extra typedefs to `ArrayRef` and `MutableArrayRef`
* `value_type`
* `pointer`
* `const_pointer`
* `reference`
* `const_reference`
* `const_reverse_iterator`
* `size_type`
* `difference_type`

It makes `ArrayRef` and `MutableArrayRef` types fully compliant with STL Container concept.

Reviewed By: lattner, courbet

Differential Revision: https://reviews.llvm.org/D95611
2021-02-26 18:37:08 +03:00
Evgeniy Brevnov 13a5cac2ba Revert "[NARY-REASSOCIATE] Support reassociation of min/max"
This reverts commit 83d134c3c4.
2021-02-26 19:47:54 +07:00
Stefan Gränitz 406ef36b03 [Orc] Use extensible RTTI for the orc::ObjectLayer class hierarchy
So far we had no way to distinguish between JITLink and RuntimeDyld in lli. Instead, we used implicit knowledge that RuntimeDyld would be used for linking ELF. In order to get D97337 to work with lli though, we have to move on and allow JITLink for ELF. This patch uses extensible RTTI to allow external clients to add their own layers without touching the LLVM sources.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D97338
2021-02-26 13:13:05 +01:00
Chen Zheng d39bc36b1b [debug-info] refactor emitDwarfUnitLength
remove `Hi` `Lo` argument from `emitDwarfUnitLength`, so we
can make caller of emitDwarfUnitLength easier.

Reviewed By: MaskRay, dblaikie, ikudrin

Differential Revision: https://reviews.llvm.org/D96409
2021-02-25 21:00:25 -05:00
James Y Knight 24539f1ef2 Add Alignment argument to IRBuilder CreateAtomicRMW and CreateAtomicCmpXchg.
And then push those change throughout LLVM.

Keep the old signature in Clang's CGBuilder for now -- that will be
updated in a follow-on patch (D97224).

The MLIR LLVM-IR dialect is not updated to support the new alignment
attribute, but preserves its existing behavior.

Differential Revision: https://reviews.llvm.org/D97223
2021-02-25 18:29:42 -05:00
Francis Visoiu Mistrih fee9abe69c [Remarks] Provide more information about auto-init calls
This now analyzes calls to both intrinsics and functions.

For intrinsics, grab the ones we know and care about (mem* family) and
analyze the arguments.

For calls, use TLI to get more information about the libcalls, then
analyze the arguments if known.

```
auto-init.c:4:7: remark: Call to memset inserted by -ftrivial-auto-var-init. Memory operation size: 4096 bytes. [-Rpass-missed=annotation-remarks]
  int var[1024];
      ^
```

Differential Revision: https://reviews.llvm.org/D97489
2021-02-25 15:14:09 -08:00
Francis Visoiu Mistrih 4753a69a31 [Remarks] Provide more information about auto-init stores
This adds support for analyzing the instruction with the !annotation
"auto-init" in order to generate a more user-friendly remark.

For now, support the store size, and whether it's atomic/volatile.

Example:

```
auto-init.c:4:7: remark: Store inserted by -ftrivial-auto-var-init.Store size: 4 bytes. [-Rpass-missed=annotation-remarks]
  int var;
      ^
```

Differential Revision: https://reviews.llvm.org/D97412
2021-02-25 15:14:09 -08:00
Adrian Prantl 00b3f2f310 Add more historic DWARF vendor extensions
The maintainer of libdwarf kindly provided this patch with a bunch of
historic DWARF extensions that are missing from Dwarf.def. This list
is helpful to avoid potential conflicts in the user-defined vendor
extension space in the future.

Patch by David Anderson!

[Relanded with an updated test.]

Differential Revision: https://reviews.llvm.org/D97242
2021-02-25 15:09:42 -08:00
Richard Smith 95d0d8e9e9 Fix constructor declarations that are invalid in C++20 onwards.
Under C++ CWG DR 2237, the constructor for a class template C must be
written as 'C(...)' not as 'C<T>(...)'. This fixes a build failure with
GCC in C++20 mode.

In passing, remove some other redundant '<T>' qualification from the
affected classes.
2021-02-25 14:25:01 -08:00
Fangrui Song 4d63892acb [SanitizerCoverage] Drop !associated on metadata sections
In SanitizerCoverage, the metadata sections (`__sancov_guards`,
`__sancov_cntrs`, `__sancov_bools`) are referenced by functions.  After
inlining, such a `__sancov_*` section can be referenced by more than one
functions, but its sh_link still refers to the original function's section.
(Note: a SHF_LINK_ORDER section referenced by a section other than its linked-to
section violates the invariant.)

If the original function's section is discarded (e.g. LTO internalization +
`ld.lld --gc-sections`), ld.lld may report a `sh_link points to discarded section` error.

This above reasoning means that `!associated` is not appropriate to be called by
an inlinable function. Non-interposable functions are inline candidates, so we
have to drop `!associated`. A `__sancov_pcs` is not referenced by other sections
but is expected to parallel a metadata section, so we have to make sure the two
sections are retained or discarded at the same time. A section group does the
trick.  (Note: we have a module ctor, so `getUniqueModuleId` guarantees to
return a non-empty string, and `GetOrCreateFunctionComdat` guarantees to return
non-null.)

For interposable functions, we could keep using `!associated`, but
LTO can change the linkage to `internal` and allow such functions to be inlinable,
so we have to drop `!associated`, too. To not interfere with section
group resolution, we need to use the `noduplicates` variant (section group flag 0).
(This allows us to get rid of the ModuleID parameter.)
In -fno-pie and -fpie code (mostly dso_local), instrumented interposable
functions have WeakAny/LinkOnceAny linkages, which are rare. So the
section group header overload should be low.

This patch does not change the object file output for COFF (where `!associated` is ignored).

Reviewed By: morehouse, rnk, vitalybuka

Differential Revision: https://reviews.llvm.org/D97430
2021-02-25 11:59:23 -08:00
Stanislav Mekhanoshin d9c99043bd Option to ignore llvm[.compiler].used uses in hasAddressTaken()
Differential Revision: https://reviews.llvm.org/D96087
2021-02-25 10:06:24 -08:00
Stanislav Mekhanoshin 29e2d9461a Option to ignore assume like intrinsic uses in hasAddressTaken()
Differential Revision: https://reviews.llvm.org/D96081
2021-02-25 09:48:29 -08:00
Jon Roelofs 7f6e331645 Support `#pragma clang section` directives on MachO targets
rdar://59560986

Differential Revision: https://reviews.llvm.org/D97233
2021-02-25 09:30:10 -08:00
Rong Xu 6103b6ad69 [SampleFDO][NFC] Refactor: make SampleProfileLoaderBaseImpl a template class
This patch makes SampleProfileLoaderBaseImpl a template class so it
can be used in CodeGen transformation.

Noticeable changes:
 * use one template parameter and use IRTraits to get other used
   types an type specific functions.
 * remove the temporary "inline" keywords in previous refactor
   patch.
 * change the template function findEquivalencesFor to a regular
   function. This function has a single caller with type of
   PostDominatorTree. It's simpler to use the type directly
   because MachinePostDominatorTree is not a derived type of
   template DominatorTreeBase.

Differential Revision: https://reviews.llvm.org/D96981
2021-02-25 08:26:17 -08:00
Fraser Cormack b368fc735d [CodeGen] Format code comment to 80 columns. NFC. 2021-02-25 15:55:21 +00:00
Jan Svoboda 43cac1d27d [clang][cli] NFC: Remove ArgList infrastructure for recording queries
This patch removes the infrastructure for recording queries in `ArgList`, partially reverting D94472.

The infrastructure was used during command line round-trip to determine which arguments should a certain subset of `CompilerInvocation` generate.

Since D96280, the command line arguments are being generated all at once, making this code no longer necessary.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D96325
2021-02-25 13:53:24 +01:00
Evgeniy Brevnov 83d134c3c4 [NARY-REASSOCIATE] Support reassociation of min/max
Support reassociation for min/max. With that we should be able to transform min(min(a, b), c) -> min(min(a, c), b) if min(a, c) is already available.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D88287
2021-02-25 18:22:39 +07:00
Jan Svoboda 88e45f00c1 [clang][cli] Add MarshallingInfoEnum multiclass
This patch introduces a tablegen multiclass called `MarshallingInfoEnum`. It has the same semantics as `MarshallingInfoString` had in combination with `AutoNormalizeEnum`, but it's easier to use and follows the convention used for other `MarshallingInfoXxx` multiclasses.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D97375
2021-02-25 08:47:18 +01:00
Liu, Chen3 4bc7c8631a [X86] Support amx-bf16 intrinsic.
Adding support for intrinsics of AMX-BF16.
This patch alse fix a bug that AMX-INT8 instructions will be selected with wrong
predicate.

Differential Revision: https://reviews.llvm.org/D97358
2021-02-25 09:06:48 +08:00
Greg McGary 151990dd94 [lld-macho] add code signature for native arm64 macOS
Differential Revision: https://reviews.llvm.org/D96164
2021-02-24 17:05:23 -08:00
Duncan P. N. Exon Smith 01701646d5 Transforms: Clone distinct nodes in metadata mapper unless RF_ReuseAndMutateDistinctMDs
This is a follow up to 22a52dfddc and a
revert of df763188c9.

With this change, we only skip cloning distinct nodes in
MDNodeMapper::mapDistinct if RF_ReuseAndMutateDistinctMDs, dropping the
no-longer-needed local helper `cloneOrBuildODR()`.  Skipping cloning in
other cases is unsound and breaks CloneModule, which is why the textual
IR for PR48841 didn't pass previously. This commit adds the test as:
Transforms/ThinLTOBitcodeWriter/cfi-debug-info-cloned-type-references-global-value.ll

Cloning less often exposed a hole in subprogram cloning in
CloneFunctionInto thanks to df763188c9a1ecb1e7e5c4d4ea53a99fbb755903's
test ThinLTO/X86/Inputs/dicompositetype-unique-alias.ll. If a function
has a subprogram attachment whose scope is a DICompositeType that
shouldn't be cloned, but it has no internal debug info pointing at that
type, that composite type was being cloned. This commit plugs that hole,
calling DebugInfoFinder::processSubprogram from CloneFunctionInto.

As hinted at in 22a52dfddcefad4f275eb8ad1cc0e200074c2d8a's commit
message, I think we need to formalize ownership of metadata a bit more
so that ValueMapper/CloneFunctionInto (and similar functions) can deal
with cloning (or not) metadata in a more generic, less fragile way.

This fixes PR48841.

Differential Revision: https://reviews.llvm.org/D96734
2021-02-24 12:57:52 -08:00
Duncan P. N. Exon Smith 1e1b92f76d IR: Rename Metadata::ImplicitCode to SubclassData1, NFC
Metadata::ImplicitCode is a bit shaved off of Metadata::Storage,
currently only in use by the subclass DILocation. However, the bit isn't
reserved for that purpose. Rename it `SubclassData1` to make it clear
that it has nothing to do with Metadata itself (and other subclasses are
free to use it).

As a drive-by, remove an old TODO about exposing bits to subclasses
(looks like that has mostly been done).

No functionality change here.

Differential Revision: https://reviews.llvm.org/D96740
2021-02-24 12:56:26 -08:00
James Y Knight c2487bf7df Remove a workaround for MSVC 2013, now that MSVC 2017 is the minimum.
In MSVC 2013, 'alignas(integer-template-arg)' didn't compile; verified
on godbolt that this now works properly.
2021-02-24 13:56:49 -05:00
Lang Hames 8380d07e39 [JITLink] Add assertions, fix a comment.
The new assertions check that Addressables removed when removing
external or absolute symbols are not referenced by another symbol.

A comment on post-fixup passes is updated: vmaddrs have all been
set up by the time the pre-fixup passes are run, post-fixup passes
run after fixups have been applied to content.
2021-02-24 21:02:37 +11:00
Dan Liew 7d3ef103b5 [ASan] Introduce a way set different ways of emitting module destructors.
Previously there was no way to control how module destructors were emitted
by `ModuleAddressSanitizerPass`. However, we want language frontends (e.g. Clang)
to be able to decide how to emit these destructors (if at all).

This patch introduces the `AsanDtorKind` enum that represents the different ways
destructors can be emitted. There are currently only two valid ways to emit destructors.

* `Global` - Use `llvm.global_dtors`. This was the previous behavior and is the default.
* `None`   - Do not emit module destructors.

The `ModuleAddressSanitizerPass` and the various wrappers around it have been updated
to take the `AsanDtorKind` as an argument.

The `-asan-destructor-kind=` command line argument has been introduced to make this
easy to test from `opt`. If this argument is specified it overrides the value passed
to the `ModuleAddressSanitizerPass` constructor.

Note that `AsanDtorKind` is not `bool` because we will introduce a new way to
emit destructors in a subsequent patch.

Note that `AsanDtorKind` is given its own header file because if it is declared
in `Transforms/Instrumentation/AddressSanitizer.h` it leads to compile error
(Module is ambiguous) when trying to use it in
`clang/Basic/CodeGenOptions.def`.

rdar://71609176

Differential Revision: https://reviews.llvm.org/D96571
2021-02-23 20:01:21 -08:00
Nico Weber f14a14dd25 Revert "Add more historic DWARF vendor extensions"
This reverts commit c4a9144468.
Breaks check-llvm everywhere, see https://reviews.llvm.org/D97242#2583716
2021-02-23 22:10:02 -05:00
Chen Zheng be5d92e37e [Debug-Info][NFC] move emitDwarfUnitLength to MCStreamer class
We may need to do some customization for DWARF unit length in DWARF
section headers for some targets for some code generation path.

For example, for XCOFF in assembly path, AIX assembler does not require
the debug section containing its debug unit length in the header.

Move emitDwarfUnitLength to MCStreamer class so that we can do
customization in different Streamers

Reviewed By: ikudrin

Differential Revision: https://reviews.llvm.org/D95932
2021-02-23 21:29:05 -05:00
Adrian Prantl c4a9144468 Add more historic DWARF vendor extensions
The maintainer of libdwarf kindly provided this patch with a bunch of
historic DWARF extensions that are missing from Dwarf.def. This list
is helpful to avoid potential conflicts in the user-defined vendor
extension space in the future.

Patch by David Anderson!

Differential Revision: https://reviews.llvm.org/D97242
2021-02-23 17:54:04 -08:00
Juneyoung Lee 56d228a14e [SimplifyCFG] Update passingValueIsAlwaysUndefined to check more attributes
This is a simple patch to update SimplifyCFG's passingValueIsAlwaysUndefined to inspect more attributes.

A new function `CallBase::isPassingUndefUB` checks attributes that imply noundef.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D97244
2021-02-24 10:40:50 +09:00
Erich Keane af4451eb4f [NFC] Make TrailingObjects non-copyable/non-movable
This got me pretty recently... TrailingObjects cannot be copied or
moved, since they need to be pre-allocated. This patch deletes the copy
and move operations (plus re-adds the default ctor).

Differential Revision: https://reviews.llvm.org/D97324
2021-02-23 16:30:13 -08:00
Fangrui Song ef312951fd collectUsedGlobalVariables: migrate SmallPtrSetImpl overload to SmallVecImpl overload after D97128
And delete the SmallPtrSetImpl overload.

While here, decrease inline element counts from 8 to 4. See D97128 for the choice.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D97257
2021-02-23 16:09:06 -08:00
Fangrui Song 3adb89bb9f [ThinLTO] Make cloneUsedGlobalVariables deterministic
Iterating on `SmallPtrSet<GlobalValue *, 8>` with more than 8 elements
is not deterministic. Use a SmallVector instead because `Used` is guaranteed to contain unique elements.

While here, decrease inline element counts from 8 to 4. The number of
`llvm.used`/`llvm.compiler.used` elements is usually 0 or 1. For full
LTO/hybrid LTO, the number may be large, so we need to be careful.

According to tejohnson's analysis https://reviews.llvm.org/D97128#2582399 , 4 is
good for a large project with WholeProgramDevirt, when available_externally
vtables are placed in the llvm.compiler.used set.

Differential Revision: https://reviews.llvm.org/D97128
2021-02-23 16:09:05 -08:00
Heejin Ahn ea8c6375e3 [WebAssembly] Fix incorrect grouping and sorting of exceptions
This CL is not big but contains changes that span multiple analyses and
passes. This description is very long because it tries to explain basics
on what each pass/analysis does and why we need this change on top of
that. Please feel free to skip parts that are not necessary for your
understanding.

---

`WasmEHFuncInfo` contains the mapping of <EH pad, the EH pad's next
unwind destination>. The value (unwind dest) here is where an exception
should end up when it is not caught by the key (EH pad). We record this
info in WasmEHPrepare to fix catch mismatches, because the CFG itself
does not have this info. A CFG only contains BBs and
predecessor-successor relationship between them, but in `WasmEHFuncInfo`
the unwind destination BB is not necessarily a successor or the key EH
pad BB. Their relationship can be intuitively explained by this C++ code
snippet:
```
try {
  try {
    foo();
  } catch (int) { // EH pad
    ...
  }
} catch (...) {   // unwind destination
}
```
So when `foo()` throws, it goes to `catch (int)` first. But if it is not
caught by it, it ends up in the next unwind destination `catch (...)`.
This unwind destination is what you see in `catchswitch`'s
`unwind label %bb` part.

---

`WebAssemblyExceptionInfo` groups exceptions so that they can be sorted
continuously together in CFGSort, as we do for loops. What this analysis
does is very simple: it creates a single `WebAssemblyException` per EH
pad, and all BBs that are dominated by that EH pad are included in this
exception. We also identify subexception relationship in this way: if
EHPad A domiantes EHPad B, EHPad B's exception is a subexception of
EHPad A's exception.

This simple rule turns out to be incorrect in some cases. In
`WasmEHFuncInfo`, if EHPad A's unwind destination is EHPad B, it means
semantically EHPad B should not be included in EHPad A's exception,
because it does not make sense to rethrow/delegate to an inner scope.
This is what happened in CFGStackify as a result of this:
```
try
  try
  catch
    ...   <- %dest_bb is among here!
  end
delegate %dest_bb
```

So this patch adds a phase in `WebAssemblyExceptionInfo::recalculate` to
make sure excptions' unwind destinations are not subexceptions of
their unwind sources in `WasmEHFuncInfo`.

But this alone does not prevent `dest_bb` in the example above from
being sorted within the inner `catch`'s exception, even if its exception
is not a subexception of that `catch`'s exception anymore, because of
how CFGSort works, which will be explained below.

---

CFGSort places BBs within the same `SortRegion` (loop or exception)
continuously together so they can be demarcated with `loop`-`end_loop`
or `catch`-`end_try` in CFGStackify.

`SortRegion` is a wrapper for one of `MachineLoop` or
`WebAssemblyException`. `SortRegionInfo` already does some complicated
things because there discrepancies between those two data structures.
`WebAssemblyException` is what we control, and it is defined as an EH
pad as its header and BBs dominated by the header as its BBs (with a
newly added exception of unwind destinations explained in the previous
paragraph). But `MachineLoop` is an LLVM data structure and uses the
standard loop detection algorithm. So by the algorithm, BBs that are 1.
dominated by the loop header and 2. have a path back to its header.
Because of the second condition, many BBs that are dominated by the loop
header are not included in the loop. So BBs that contain `return` or
branches to outside of the loop are not technically included in
`MachineLoop`, but they can be sorted together with the loop with no
problem.

Maybe to relax the condition, in CFGSort, when we are in a `SortRegion`
we allow sorting of not only BBs that belong to the current innermost
region but also BBs that are by the current region header.
(This was written this way from the first version written by Dan, when
only loops existed.) But now, we have cases in exceptions when EHPad B
is the unwind destination for EHPad A, even if EHPad B is dominated by
EHPad A it should not be included in EHPad A's exception, and should not
be sorted within EHPad A.

One way to make things work, at least correctly, is change `dominates`
condition to `contains` condition for `SortRegion` when sorting BBs, but
this will change compilation results for existing non-EH code and I
can't be sure it will not degrade performance or code size. I think it
will degrade performance because it will force many BBs dominated by a
loop, which don't have the path back to the header, to be placed after
the loop and it will likely to create more branches and blocks.

So this does a little hacky check when adding BBs to `Preferred` list:
(`Preferred` list is a ready list. CFGSort maintains ready list in two
priority queues: `Preferred` and `Ready`. I'm not very sure why, but it
was written that way from the beginning. BBs are first added to
`Preferred` list and then some of them are pushed to `Ready` list, so
here we only need to guard condition for `Preferred` list.)

When adding a BB to `Preferred` list, we check if that BB is an unwind
destination of another BB. To do this, this adds the reverse mapping,
`UnwindDestToSrc`, and getter methods to `WasmEHFuncInfo`. And if the BB
is an unwind destination, it checks if the current stack of regions
(`Entries`) contains its source BB by traversing the stack backwards. If
we find its unwind source in there, we add the BB to its `Deferred`
list, to make sure that unwind destination BB is added to `Preferred`
list only after that region with the unwind source BB is sorted and
popped from the stack.

---

This does not contain a new test that crashes because of this bug, but
this fix changes the result for one of existing test case. This test
case didn't crash because it fortunately didn't contain `delegate` to
the incorrectly placed unwind destination BB.

Fixes https://github.com/emscripten-core/emscripten/issues/13514.

Reviewed By: dschuff, tlively

Differential Revision: https://reviews.llvm.org/D97247
2021-02-23 14:54:55 -08:00
Matthew Voss 6da7d31416 [llvm-profdata] Emit Error when Invalid MemOpSize Section is Created by llvm-profdata
Under certain (currently unknown) conditions, llvm-profdata is outputting
profiles that have two consecutive entries in the MemOPSize section for the
value 0. This causes the PGOMemOPSizeOpt pass to output an invalid switch
instruction with two cases for 0. As mentioned, we’re not quite sure what’s
causing this to happen, but this patch prevents llvm-profdata from outputting a
profile that has this problem and gives an error with a request for a
reproducible.

Differential Revision: https://reviews.llvm.org/D92074
2021-02-23 12:51:54 -08:00
Jay Foad a6be26710b [GlobalISel] Make more use of replaceSingleDefInstWithReg. NFC. 2021-02-23 17:08:34 +00:00
Juneyoung Lee 19c2e12947 [JumpThreading] Update computeValueKnownInPredecessors to recognize logical and/or patterns
This allows JumpThreading's computeValueKnownInPredecessors to
recognize select form of and/or patterns as well.
2021-02-24 00:06:10 +09:00
Nate Chandler 01b4890e47 Add @llvm.coro.async.size.replace intrinsic.
The new intrinsic replaces the size in one specified AsyncFunctionPointer with
the size in another.  This ability is necessary for functions which merely
forward to async functions such as those defined for partial applications.

Reviewed By: aschwaighofer

Differential Revision: https://reviews.llvm.org/D97229
2021-02-23 06:43:52 -08:00
David Green dd2dbf7ee2 [TTI] Change getOperandsScalarizationOverhead to take Type args
As a followup to D95291, getOperandsScalarizationOverhead was still
using a VF as a vector factor if the arguments were scalar, and would
assert on certain matrix intrinsics with differently sized vector
arguments. This patch removes the VF arg, instead passing the Types
through directly. This should allow it to more accurately compute the
cost without having to guess at which operands will be vectorized,
something difficult with more complex intrinsics.

This adjusts one SVE test as it is now calling the wrong intrinsic vs
veccall. Without invalid InstructCosts the cost of the scalarized
intrinsic is too low. This should get fixed when the cost of
scalarization is accounted for with scalable types.

Differential Revision: https://reviews.llvm.org/D96287
2021-02-23 13:04:59 +00:00
David Green bd4b61efbd [CostModel] Remove VF from IntrinsicCostAttributes
getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various
parameters of the intrinsic being costed. It can either be called with a
scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction
(RetTy==Vector, VF==1) or from the vectorizer with a scalar type and
vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered
an error. Both of the vector modes are expected to be treated the same,
but because this is confusing many backends end up getting it wrong.

Instead of trying work with those two values separately this removes the
VF parameter, widening the RetTy/ArgTys by VF used called from the
vectorizer. This keeps things simpler, but does require some other
modifications to keep things consistent.

Most backends look like this will be an improvement (or were not using
getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code
from c230965ccf working. ARM removed the fix in
dfac521da1, webassembly happens to get a fixup for an SLP cost
issue and both X86 and AArch64 seem to now be using better costs from
the vectorizer.

Differential Revision: https://reviews.llvm.org/D95291
2021-02-23 13:03:26 +00:00
Alexey Lapshin 875b3b2cdd [Support] Add reserve() method to the raw_ostream.
If resulting size of the output stream is already known,
then the space for stream data could be preliminary
allocated in some cases. f.e. raw_string_ostream could
preallocate the space for the target string(it allows
to avoid reallocations during writing into the stream).

Differential Revision: https://reviews.llvm.org/D91693
2021-02-23 14:06:38 +03:00
Andy Wingo 7dc98adbb0 Revert "[WebAssembly] call_indirect issues table number relocs"
This reverts commit 861dbe1a02.  It broke
emscripten -- see https://reviews.llvm.org/D90948#2578843.
2021-02-23 11:48:08 +01:00
Liu, Chen3 f8b9035aae [X86] Support amx-int8 intrinsic.
Adding support for intrinsics of TDPBSUD/TDPBUSD/TDPBUUD.

Differential Revision: https://reviews.llvm.org/D97259
2021-02-23 17:08:05 +08:00
Lang Hames 430817d0d5 [JITLink] Add a getFixupAddress convenience method to Block. 2021-02-23 11:08:54 +11:00
Lang Hames adf2098bd8 [JITLink] Don't allow creation of sections with duplicate names. 2021-02-23 11:08:54 +11:00
Heejin Ahn a08e609d2e [WebAssembly] Rename methods in WasmEHFuncInfo (NFC)
This renames variable and method names in `WasmEHFuncInfo` class to be
simpler and clearer. For example, unwind destinations are EH pads by
definition so it doesn't necessarily need to be included in every method
name. Also I am planning to add the reverse mapping in a later CL,
something like `UnwindDestToSrc`, so this renaming will make meanings
clearer.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D97173
2021-02-22 12:16:11 -08:00
Leonard Chan 1c932baeaa [llvm][Bitcode] Add bitcode reader/writer for DSOLocalEquivalent
This is necessary for compilation with [thin]lto.

Differential Revision: https://reviews.llvm.org/D96170
2021-02-22 10:37:57 -08:00
Nikita Popov 5e7e499b91 [JumpThreading] Clone noalias.scope.decl when threading blocks
When cloning instructions during jump threading, also clone and
adapt any declared scopes. This is primarily important when
threading loop exits, because we'll end up with two dominating
scope declarations in that case (at least after additional loop
rotation). This addresses a loose thread from
https://reviews.llvm.org/rG2556b413a7b8#975012.

Differential Revision: https://reviews.llvm.org/D97154
2021-02-22 18:35:30 +01:00
Ryan Santhiraraja 2c25efcbd3 [AArch64] Adding SHA3 Intrinsics support
This patch adds the following SHA3 Intrinsics:
        vsha512hq_u64,
        vsha512h2q_u64,
        vsha512su0q_u64,
        vsha512su1q_u64
        veor3q_u8
        veor3q_u16
        veor3q_u32
        veor3q_u64
        veor3q_s8
        veor3q_s16
        veor3q_s32
        veor3q_s64
        vrax1q_u64
        vxarq_u64
        vbcaxq_u8
        vbcaxq_u16
        vbcaxq_u32
        vbcaxq_u64
        vbcaxq_s8
        vbcaxq_s16
        vbcaxq_s32
        vbcaxq_s64

    Note need to include +sha3 and +crypto when building from the front-end

Reviewed By: DavidSpickett

Differential Revision: https://reviews.llvm.org/D96381
2021-02-22 12:09:20 +00:00
Andy Wingo 861dbe1a02 [WebAssembly] call_indirect issues table number relocs
If the reference-types feature is enabled, call_indirect will explicitly
reference its corresponding function table via `TABLE_NUMBER`
relocations against a table symbol.

Also, as before, address-taken functions can also cause the function
table to be created, only with reference-types they additionally cause a
symbol table entry to be emitted.

We abuse the used-in-reloc flag on symbols to indicate which tables
should end up in the symbol table.  We do this because unfortunately
older wasm-ld will carp if it see a table symbol.

Differential Revision: https://reviews.llvm.org/D90948
2021-02-22 10:13:36 +01:00
Kazu Hirata 5032b5890b [llvm] Fix header guards (NFC)
Identified with llvm-header-guard.
2021-02-21 19:58:05 -08:00
madhur13490 5fe23de5db [NFC] Remove redundant word in comment
Differential Revision: https://reviews.llvm.org/D97157
2021-02-21 18:04:20 +00:00
Nikita Popov e0615bcd39 [Loads] Add optimized FindAvailableLoadedValue() overload (NFCI)
FindAvailableLoadedValue() accepts an iterator by reference. If no
available value is found, then the iterator will either be left
at a clobbering instruction or the beginning of the basic block.
This allows using FindAvailableLoadedValue() across multiple blocks.

If this functionality is not needed, as is the case in InstCombine,
then we can use a much more efficient implementation: First try
to find an available value, and only perform clobber checks if
we actually found one. As this function only looks at a very small
number of instructions (6 by default) and usually doesn't find an
available value, this saves many expensive alias analysis queries.
2021-02-21 18:42:56 +01:00
Juneyoung Lee aacf7878bc [ValueTracking] Improve impliesPoison
This patch improves ValueTracking's impliesPoison(V1, V2) to do this reasoning:

```
  %res = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %a, i64 %b)
  %overflow = extractvalue { i64, i1 } %res, 1
  %mul      = extractvalue { i64, i1 } %res, 0

	; If %mul is poison, %overflow is also poison, and vice versa.
```

This improvement leads to supporting this optimization under `-instcombine-unsafe-select-transform=0`:

```
define i1 @test2_logical(i64 %a, i64 %b, i64* %ptr) {
; CHECK-LABEL: @test2_logical(
; CHECK-NEXT:    [[MUL:%.*]] = mul i64 [[A:%.*]], [[B:%.*]]
; CHECK-NEXT:    [[TMP1:%.*]] = icmp ne i64 [[A]], 0
; CHECK-NEXT:    [[TMP2:%.*]] = icmp ne i64 [[B]], 0
; CHECK-NEXT:    [[OVERFLOW_1:%.*]] = and i1 [[TMP1]], [[TMP2]]
; CHECK-NEXT:    [[NEG:%.*]] = sub i64 0, [[MUL]]
; CHECK-NEXT:    store i64 [[NEG]], i64* [[PTR:%.*]], align 8
; CHECK-NEXT:    ret i1 [[OVERFLOW_1]]
;

  %res = tail call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %a, i64 %b)
  %overflow = extractvalue { i64, i1 } %res, 1
  %mul = extractvalue { i64, i1 } %res, 0
  %cmp = icmp ne i64 %mul, 0
  %overflow.1 = select i1 %overflow, i1 true, i1 %cmp
  %neg = sub i64 0, %mul
  store i64 %neg, i64* %ptr, align 8
  ret i1 %overflow.1
}
```

Previously, this didn't happen because the flag prevented `select i1 %overflow, i1 true, i1 %cmp` from being `or i1 %overflow, %cmp`.
Note that the select -> or conversion happens only when `impliesPoison(%cmp, %overflow)` returns true.
This improvement allows `impliesPoison` to do the reasoning.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D96929
2021-02-20 13:22:34 +09:00
Craig Topper baab797878 [ValueTypes] Assert if changeVectorElementType is called on a simple type with an extended element type.
Previously we would use the extended implementation, but
the extended implementation requires the vector type to be extended
so that we can access the LLVMContext. In theory we could
detect this case and use the context from the element type instead,
but since I know of no cases hitting this in practice today
I've done the simplest thing.

Also add asserts to several extended EVT functions that assume
LLVMTy is non-null.

Follow from discussion in D97036

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D97070
2021-02-19 17:30:46 -08:00
Mircea Trofin 82492f24ff [NFC][Regalloc] Share the VirtRegAuxInfo object with LiveRangeEdit
VirtRegAuxInfo is an extensibility point, so the register allocator's
decision on which implementation to use should be communicated to the
other users - namely, LiveRangeEdit.

Differential Revision: https://reviews.llvm.org/D96898
2021-02-19 07:44:28 -08:00
Simon Pilgrim aa44815f84 Remove unnecessary "using namespace llvm" inside "namespace llvm". NFCI. 2021-02-19 11:15:16 +00:00
Nikita Popov 370addb996 [IR] Move willReturn() to Instruction
This moves the willReturn() helper from CallBase to Instruction,
so that it can be used in a more generic manner. This will make
it easier to fix additional passes (ADCE and BDCE), and will give
us one place to change if additional instructions should become
non-willreturn (e.g. there has been talk about handling volatile
operations this way).

I have also included the IntrinsicInst workaround directly in
here, so that it gets applied consistently. (As such this change
is not entirely NFC -- FuncAttrs will now use this as well.)

Differential Revision: https://reviews.llvm.org/D96992
2021-02-19 11:56:01 +01:00
Djordje Todorovic 1a2b3536ef Reland "[Debugify] Make the debugify aware of the original (-g) Debug Info"
As discussed on the RFC [0], I am sharing the set of patches that
    enables checking of original Debug Info metadata preservation in
    optimizations. The proof-of-concept/proposal can be found at [1].

    The implementation from the [1] was full of duplicated code,
    so this set of patches tries to merge this approach into the existing
    debugify utility.

    For example, the utility pass in the original-debuginfo-check
    mode could be invoked as follows:

      $ opt -verify-debuginfo-preserve -pass-to-test sample.ll

    Since this is very initial stage of the implementation,
    there is a space for improvements such as:
      - Add support for the new pass manager
      - Add support for metadata other than DILocations and DISubprograms

    [0] https://groups.google.com/forum/#!msg/llvm-dev/QOyF-38YPlE/G213uiuwCAAJ
    [1] https://github.com/djolertrk/llvm-di-checker

    Differential Revision: https://reviews.llvm.org/D82545

The test that was failing is now forced to use the old PM.
2021-02-18 23:29:22 -08:00
Serge Pavlov 2c4f60e45b [FPEnv][AArch64] Implement lowering of llvm.set.rounding
Differential Revision: https://reviews.llvm.org/D96836
2021-02-19 13:16:51 +07:00
Lang Hames 0469256d35 [ORC] Print CPU feature string in JITTargetMachineBuilder debugging output. 2021-02-19 15:18:19 +11:00
Konstantin Zhuravlyov 71d1f785a5 AMDGPU/ELF: Sort MACHs by value and add missing reserved MACHs
- Sort MACHs by its value
  - Add missing reserved MACHs
    - EF_AMDGPU_MACH_AMDGCN_RESERVED_0X3D
    - EF_AMDGPU_MACH_AMDGCN_RESERVED_0X3E

Differential Revision: https://reviews.llvm.org/D97010
2021-02-18 20:46:27 -05:00
Wei Mi 5fb65c02ca [SampleFDO] Stop repeated indirect call promotion for the same target.
Found a problem in indirect call promotion in sample loader pass. Currently
if an indirect call is promoted for a target, and if the parent function is
inlined into some other function, the indirect call can be promoted for the
same target again. That is redundent which can harm performance and can cause
excessive compile time in some extreme case.

The patch fixes the issue. If a target is promoted for an indirect call, the
patch will write ICP metadata with the target call count being set to 0.
In the later ICP in sample profile loader, if it sees a target has 0 count
for an indirect call, it knows the target has been promoted and won't do
indirect call promotion for the indirect call.

The fix brings 0.1~0.2% performance on our search benchmark.

Differential Revision: https://reviews.llvm.org/D96806
2021-02-18 17:01:32 -08:00
Leonard Chan c77659e549 [llvm][IR] Do not place constants with static relocations in a mergeable section
This patch provides two major changes:

1. Add getRelocationInfo to check if a constant will have static, dynamic, or
   no relocations. (Also rename the original needsRelocation to needsDynamicRelocation.)
2. Only allow a constant with no relocations (static or dynamic) to be placed
   in a mergeable section.

This will allow unused symbols that contain static relocations and happen to
fit in mergeable constant sections (.rodata.cstN) to instead be placed in
unique-named sections if -fdata-sections is used and subsequently garbage collected
by --gc-sections.

See https://lists.llvm.org/pipermail/llvm-dev/2021-February/148281.html.

Differential Revision: https://reviews.llvm.org/D95960
2021-02-18 15:39:00 -08:00
Petr Hosek 5fbd1a333a [Coverage] Store compilation dir separately in coverage mapping
We currently always store absolute filenames in coverage mapping.  This
is problematic for several reasons. It poses a problem for distributed
compilation as source location might vary across machines.  We are also
duplicating the path prefix potentially wasting space.

This change modifies how we store filenames in coverage mapping. Rather
than absolute paths, it stores the compilation directory and file paths
as given to the compiler, either relative or absolute. Later when
reading the coverage mapping information, we recombine relative paths
with the working directory. This approach is similar to handling
ofDW_AT_comp_dir in DWARF.

Finally, we also provide a new option, -fprofile-compilation-dir akin
to -fdebug-compilation-dir which can be used to manually override the
compilation directory which is useful in distributed compilation cases.

Differential Revision: https://reviews.llvm.org/D95753
2021-02-18 14:34:39 -08:00
Nikita Popov 70e3c9a8b6 [BasicAA] Always strip single-argument phi nodes
We can always look through single-argument (LCSSA) phi nodes when
performing alias analysis. getUnderlyingObject() already does this,
but stripPointerCastsAndInvariantGroups() does not. We still look
through these phi nodes with the usual aliasPhi() logic, but
sometimes get sub-optimal results due to the restrictions on value
equivalence when looking through arbitrary phi nodes. I think it's
generally beneficial to keep the underlying object logic and the
pointer cast stripping logic in sync, insofar as it is possible.

With this patch we get marginally better results:

  aa.NumMayAlias | 5010069 | 5009861
  aa.NumMustAlias | 347518 | 347674
  aa.NumNoAlias | 27201336 | 27201528
  ...
  licm.NumPromoted | 1293 | 1296

I've renamed the relevant strip method to stripPointerCastsForAliasAnalysis(),
as we're past the point where we can explicitly spell out everything
that's getting stripped.

Differential Revision: https://reviews.llvm.org/D96668
2021-02-18 23:07:50 +01:00
Petr Hosek fbf8b957fd Revert "[Coverage] Store compilation dir separately in coverage mapping"
This reverts commit 97ec8fa5bb since
the test is failing on some bots.
2021-02-18 12:50:24 -08:00
Petr Hosek 97ec8fa5bb [Coverage] Store compilation dir separately in coverage mapping
We currently always store absolute filenames in coverage mapping.  This
is problematic for several reasons. It poses a problem for distributed
compilation as source location might vary across machines.  We are also
duplicating the path prefix potentially wasting space.

This change modifies how we store filenames in coverage mapping. Rather
than absolute paths, it stores the compilation directory and file paths
as given to the compiler, either relative or absolute. Later when
reading the coverage mapping information, we recombine relative paths
with the working directory. This approach is similar to handling
ofDW_AT_comp_dir in DWARF.

Finally, we also provide a new option, -fprofile-compilation-dir akin
to -fdebug-compilation-dir which can be used to manually override the
compilation directory which is useful in distributed compilation cases.

Differential Revision: https://reviews.llvm.org/D95753
2021-02-18 12:27:42 -08:00
Sam Powell eb2eeeb76f [llvm][TextAPI] add equality operator for InterfaceFile
This patch adds functionality to compare for the equality between `InterfaceFile`s based on attributes specific to linking.

Reviewed By: cishida, steven_wu

Differential Revision: https://reviews.llvm.org/D96629
2021-02-18 11:53:08 -08:00
Ta-Wei Tu f70cdc5b5c [NPM] Properly reset parent loop after loop passes
This fixes https://bugs.llvm.org/show_bug.cgi?id=49185

When `NDEBUG` is not set, `LPMUpdater` checks if the added loops have the same parent loop as the current one in `addSiblingLoops`.
If multiple loop passes are executed through `LoopPassManager`, `U.ParentL` will be the same across all passes.
However, the parent loop might change after running a loop pass, resulting in assertion failures in subsequent passes.

This patch resets `U.ParentL` after running individual loop passes in `LoopPassManager`.

Reviewed By: asbirlea, ychen

Differential Revision: https://reviews.llvm.org/D96727
2021-02-19 02:50:53 +08:00
Bradley Smith 8bad8a43c3 [AArch64][SVE] Add patterns to generate FMLA/FMLS/FNMLA/FNMLS/FMAD
Adjust generateFMAsInMachineCombiner to return false if SVE is present
in order to combine fmul+fadd into fma. Also add new pseudo instructions
so as to select the most appropriate of FMLA/FMAD depending on register
allocation.

Depends on D96599

Differential Revision: https://reviews.llvm.org/D96424
2021-02-18 16:55:16 +00:00