Commit Graph

151087 Commits

Author SHA1 Message Date
Yvan Roux 261cbe98c3 [RISCV] Fix Machine Outliner jump table handling.
Don't outline machine instructions which are using jump table indexes
since they are materialized as local labels (like the already handled
case of constant pools).

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D109436
2021-09-09 07:32:30 +02:00
Chris Lattner 9e46dd965a [APInt.h] Reduce the APInt header file interface a bit. NFC
This moves one mid-size function out of line, inlines the
trivial tcAnd/tcOr/tcXor/tcComplement methods into their only
caller, and moves the magic/umagic functions into SelectionDAG
since they are implementation details of its algorithm. This
also removes the unit tests for magic, but these are already
tested in the divide lowering logic for various targets.

This also upgrades some C style comments to C++.

Differential Revision: https://reviews.llvm.org/D109476
2021-09-08 18:17:07 -07:00
Jessica Paquette 22a64d4a14 [MachineOutliner][AArch64] Ensure LR is live-in when inserting reg-save calls
Similar to other code which handles creating the function frame.

If LR isn't live-in to the block that we're inserting the call into, we'll get
a MachineVerifier error.
2021-09-08 17:44:27 -07:00
Amara Emerson eae44c8a86 [GlobalISel] Implement merging of stores of truncates.
This is a port of a combine which matches a pattern where a wide type scalar
value is stored by several narrow stores. It folds it into a single store or
a BSWAP and a store if the targets supports it.

Assuming little endian target:
 i8 *p = ...
 i32 val = ...
 p[0] = (val >> 0) & 0xFF;
 p[1] = (val >> 8) & 0xFF;
 p[2] = (val >> 16) & 0xFF;
 p[3] = (val >> 24) & 0xFF;
=>
 *((i32)p) = val;

On CTMark AArch64 -Os this results in a good amount of savings:

Program            before        after       diff
             SPASS 412792       412788       -0.0%
                kc 432528       432512       -0.0%
            lencod 430112       430096       -0.0%
  consumer-typeset 419156       419128       -0.0%
            bullet 475840       475752       -0.0%
        tramp3d-v4 367760       367628       -0.0%
          clamscan 383388       383204       -0.0%
    pairlocalalign 249764       249476       -0.1%
    7zip-benchmark 570100       568860       -0.2%
           sqlite3 287628       286920       -0.2%
Geomean difference                           -0.1%

Differential Revision: https://reviews.llvm.org/D109419
2021-09-08 17:06:33 -07:00
Philip Reames e741fabc22 [SCEV] Move getIndexExpressionsFromGEP to delinearize [NFC] 2021-09-08 16:56:49 -07:00
Philip Reames 4b5e260b1d [SCEV] Simplify findExistingSCEVInCache interface [NFC]
We were returning a tuple when all but one caller only cared about one piece of the return value.  That one caller can inline the complexity, and we can simplify all other uses.
2021-09-08 15:26:07 -07:00
Andrew Litteken 144cd22bae [CodeExtractor] Creating exit stubs based off original order branch instructions.
Previously the CodeExtractor created exit stubs, and the subsequent return value of the outlined function based on the order of out-of-region blocks after splitting any phi nodes, and collecting the blocks to be outlined. This could cause differences in order if there was a difference of exit block phi nodes between the two regions. This patch moves the collection of the output target blocks to be before this occurs, so that the assignment of target block to output value will be the same, regardless of the contents of the output block.

Reviewers: paquette, roelofs

Differential Revision: https://reviews.llvm.org/D108657
2021-09-08 15:15:15 -07:00
Arthur Eubanks fe15347a1e Port the cost model printer to New PM
Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D109284
2021-09-08 14:47:05 -07:00
Craig Topper a574f0e0c3 [RISCV] Disable use of i128 shift libcalls on RV32.
Since i128 isn't a legal C type on RV32, I don't believe
libgcc implements these functions for RV32. compiler-rt
does implement them because i128 support is enabled
in order to handle long double.

This is consistent with 32-bit X86 and ARM.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D109383
2021-09-08 14:26:07 -07:00
Michael Kruse 088577a38e [Delinerization] Require by offset to be zero.
Users of delinearization assume that the the offset into the array element is zero. In most cases it will indeed be zero, but if it is not, the delinearization has to fail since it violates that assumption without the API even allowing to signal to the caller that the by offset is non-zero.

This bug caused Polly to miscompile blender (526.blender_r from SPEC CPU 2017) in -polly-process-unprofitable mode. The SCEV expression incorrectly delinearized has been reduced in the test case byte_offset.ll. The dropped offset into the array element of size 4 (a float) is ((sext i32 %mul7.i4534 to i64) + {(sext i32 %i1 to i64),+,((sext i32 (1 + ((1 + %shl.i.i) * (1 + %shl.i.i)) + %shl.i.i) to i64) * (sext i32 %i1 to i64))}<%for.body703>). This significant component was just dropped, and the wrong pointer was computed when regenerating code from the remaining delinearized subscripts. This occurred during blender's subsurface scattering implementation. As a result, blender's rendering diverged from the reference image.

Patch D108885 would also fix the API.

Reviewed By: bmahjour

Differential Revision: https://reviews.llvm.org/D109133
2021-09-08 16:02:37 -05:00
Greg Clayton 14850a0628 Log to the right stream in DwarfTransformer::handleDie().
Since we might end up using multiple threads when logging information in the DWARFTransformer, the handleDie() method must use the supplied stream named "OS" when logging warnings and errors. When we use multiple threads, we log to a thread specific stream buffer and then use a mutex to ensure our output doesn't overlap when we emit warnings and errors after a thread is done.

Differential Revision: https://reviews.llvm.org/D109401
2021-09-08 14:00:19 -07:00
Florian Hahn f4726e7238
[LAA] Remove unused OrigPtr from replaceSymbolicStrideSCEV (NFC).
The OrigPtr argument is not used in tree.
2021-09-08 22:35:36 +02:00
Nikita Popov 6dfdc6bfd2 [SROA] Support opaque pointers
Make the following changes in order to support opaque pointers in SROA:

 * Generate i8 GEPs for opaque pointers.
 * Explicitly enforce that promotable allocas only have stores of
   the alloca type -- previously this was implicitly enforced.
 * Replace a check for pointer element type with load/store type.

Differential Revision: https://reviews.llvm.org/D109259
2021-09-08 22:25:44 +02:00
Arthur Eubanks b493124ae2 [MemorySSA] Support invariant.group metadata
The implementation is mostly copied from MemDepAnalysis. We want to look
at all loads and stores to the same pointer operand. Bitcasts and zero
GEPs of a pointer are considered the same pointer value. We choose the
most dominating instruction.

Since updating MemorySSA with invariant.group is non-trivial, for now
handling of invariant.group is not cached in any way, so it's part of
the walker. The number of loads/stores with invariant.group is small for
now anyway. We can revisit if this actually noticeably affects compile
times.

To avoid invariant.group affecting optimized uses, we need to have
optimizeUsesInBlock() not use invariant.group in any way.

Co-authored-by: Piotr Padlewski <prazek@google.com>

Reviewed By: asbirlea, nikic, Prazek

Differential Revision: https://reviews.llvm.org/D109134
2021-09-08 13:06:12 -07:00
Philip Reames 585c594d74 Move delinearization logic out of SCEV [NFC]
None of this logic has anything to do with SCEV's internals, it just uses the existing public APIs.  As a result, we can move the code from ScalarEvolution.cpp/hpp to Delinearization.cpp/hpp with only minor changes.

This was discussed in advance on today's loop opt call.  It turned out to be easy as hoped.
2021-09-08 12:28:35 -07:00
Nikita Popov 3e54de4df2 [ConstantHoisting] Support opaque pointers
Directly use i8 for GEP, rather than fetching element type of i8*.
2021-09-08 21:23:10 +02:00
Akira Hatanaka dea6f71af0 [ObjC][ARC] Use the addresses of the ARC runtime functions instead of
integer 0/1 for the operand of bundle "clang.arc.attachedcall"

https://reviews.llvm.org/D102996 changes the operand of bundle
"clang.arc.attachedcall". This patch makes changes to llvm that are
needed to handle the new IR.

This should make it easier to understand what the IR is doing and also
simplify some of the passes as they no longer have to translate the
integer values to the runtime functions.

Differential Revision: https://reviews.llvm.org/D103000
2021-09-08 11:58:03 -07:00
Andrew Litteken 0087bb4a9a [IROutliner] Using canonical values to find corresponding values. (NFC)
D104143 introduced canonical value numbering between regions, which allows for the easy identification of items across a region, eliminating the need in the outliner to create parallel lists of instructions for each region, and replace output values in a less convoluted way.

Additionally, in a future commit, the output values will not necessarily be recorded values from the region itself, it could be a combination value where the actual value being output is a PHINode instead.  This new method allows us to handle the replacement of the output value to the stored value with the corresponding item in the same place for both normal output values, and PHINode outputs instead of handling the different types of outputs in different locations.

Reviewers: paquette, roelofs

Differential Revision: https://reviews.llvm.org/D108656
2021-09-08 11:36:05 -07:00
Joseph Huber 6b9a3ec3a2 [OpenMP] Do not SPMDize generic regions with no parallel
This patch changes SPMDization to not trigger for regions with no
parallelism. Otherwise, this will introduce unnecessary barriers that
will slow the single-threaded region down.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D109438
2021-09-08 14:33:15 -04:00
Nick Desaulniers 4331f19d8b [ISEL][BitTestBlock] omit additional bit test when default destination is unreachable
Otherwise we end up with an extra conditional jump, following by an
unconditional jump off the end of a function. ie.

  bb.0:
    BT32rr ..
    JCC_1 %bb.4 ...
  bb.1:
    BT32rr ..
    JCC_1 %bb.2 ...
    JMP_1 %bb.3
  bb.2:
    ...
  bb.3.unreachable:
  bb.4:
    ...

  Should be equivalent to:
  bb.0:
    BT32rr ..
    JCC_1 %bb.4 ...
    JMP_1 %bb.2
  bb.1:
  bb.2:
    ...
  bb.3.unreachable:
  bb.4:
    ...

This can occur since at the higher level IR (Instruction) SwitchInsts
are required to have BBs for default destinations, even when it can be
deduced that such BBs are unreachable.

For most programs, this isn't an issue, just wasted instructions since the
unreachable has been statically proven.

The x86_64 Linux kernel when built with CONFIG_LTO_CLANG_THIN=y fails to
boot though once D106056 is re-applied.  D106056 makes it more likely
that correlation-propagation (CVP) can deduce that the default case of
SwitchInsts are unreachable. The x86_64 kernel uses a binary post
processor called objtool, which emits this warning:

vmlinux.o: warning: objtool: cfg80211_edmg_chandef_valid()+0x169: can't
find jump dest instruction at .text.cfg80211_edmg_chandef_valid+0x17b

I haven't debugged precisely why this causes a failure at boot time, but
fixing this very obvious jump off the end of the function fixes the
warning and boot problem.

Link: https://bugs.llvm.org/show_bug.cgi?id=50080
Fixes: https://github.com/ClangBuiltLinux/linux/issues/679
Fixes: https://github.com/ClangBuiltLinux/linux/issues/1440

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D109103
2021-09-08 11:03:47 -07:00
Kirill Stoimenov 3f875134a7 [asan] Fixed the jump to use the 4 byte offset version.
This should have been the 4 byte version in the first place. Unfortunatelly there is no easy way to add a test as both the 1 byte and 4 byte version are printed as 'jmp' in the assembly code.

Reviewed By: kda

Differential Revision: https://reviews.llvm.org/D109453
2021-09-08 17:58:12 +00:00
Wouter van Oortmerssen a99fb86c65 [WebAssembly] Change WebAssemblyMCLowerPrePass to ModulePass
It was a FunctionPass before, which subverted its purpose to collect ALL symbols before MCLowering, depending on how LLVM schedules function passes.
Fixes https://bugs.llvm.org/show_bug.cgi?id=51555

Differential Revision: https://reviews.llvm.org/D109202
2021-09-08 10:47:43 -07:00
Craig Topper aca14c8cf1 [RISCV] Remove unused tablegen template parameters. NFC
Identified in D109359
2021-09-08 10:01:42 -07:00
Craig Topper b04c09c07c [RISCV] Use V0 instead of VMV0: for mask vectors in isel patterns.
This is consistent with the RVV intrinsic patterns. This has been
shown to prevent some "ran out of registers" errors in our internal
testing.

Unfortunately, there are some regressions on LMUL=8 tests in here.
I think the lack of registers with LMUL=8 just makes it very hard
to schedule correctly.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D109245
2021-09-08 09:46:21 -07:00
Benjamin Kramer 373b7622c1 [IROutliner] Remove unused variable. NFC. 2021-09-08 18:33:41 +02:00
Roman Lebedev 0852f8706b
[X86] X86DAGToDAGISel::matchBitExtract(): support 'num high bits to clear' pattern
Currently, we only deal with the case where we can match
the number of low bits to be kept, i.e.:
```
x & ((1 << y) - 1)
```
will extract low `y` bits of `x`.

But what will
```
x & (-1 >> y)
```
do?

Logically, it will extract `bitwidth(x) - y` low bits, i.e.:
```
x & ~(-1 << (bitwidth(x)-y))
```
... except we can't do such a transformation in IR in general,
because if we wanted to extract all the bits `(-1 >> 0)` is fine,
but `-1 << bitwidth(x)` would be `poison`: https://alive2.llvm.org/ce/z/BKJZfw,

Yet, here with BMI's BEXTR and BMI2's BZHI we don't have any such problems with edge-cases.
So what we can do is: https://alive2.llvm.org/ce/z/gm5M2B

As briefly discussed with @craig.topper, this appears to be not worse than what we'd end up with currently (a pair of shifts):
* https://godbolt.org/z/nsPb8bejs (direct data dependency, sequential execution)
* https://godbolt.org/z/7bj3zeh1d (no direct data dependency, parallel execution)

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D107923
2021-09-08 19:27:08 +03:00
Craig Topper 1f16191906 [RISCV] Add an GPR def to the Zvlseg SPILL/RELOAD pseudos
The expansion of these pseudos creates ADD instructions. Those
ADDs modify a GPR so that it is no longer contains the same value
as the input base pointer. Therefore, I believe we should have a
GPR as a Def on these instructions and expansion should get the
destination register for the ADDs from that operand.

At least in our tests here this works out so that register
scavenging picks the same register as the base pointer.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D109405
2021-09-08 09:23:33 -07:00
Andrew Litteken c172f1ad39 [IROutliner] Adding supports for multiple exits
When we start outlining across branches, there is the possibility that we will have two different blocks with different output locations, or a single branch that goes to two blocks outside of the region that is being outlined. While the CodeExtractor provides most of the mechanisms by using the return value of the extracted function as the input to a switch statement to correctly branch to the correct location, we need special handling for different output schemas to each location.

This is done by repeating the existing storing scheme for each different exit block. We have a map from the return values used, to the basic block that is used to store the outputs for that particular exit block within the outlined function. Then if needed, we create a switch statement for each return block to branch to the correct set of stored outputs.

Reviewers: paquette

Differential Revision: https://reviews.llvm.org/D106993
2021-09-08 08:58:07 -07:00
Kazu Hirata bcfbb3f9ec [IR] Construct SmallVector with iterator ranges (NFC)
Note that arg_operands has been deprecated in favor of args.
2021-09-08 08:54:15 -07:00
Peter Smith b026ce9c8a [MC] Add Subtarget for MAsmParser call to emitCodeAlignment
The call to emitCodeAlignment was missing a STI which is required
after D45962.

emitCodeAlignment has a default parameter of 0 for MaxBytesToEmit.
Explicitly passing 0 here was interpreted as as nullptr for the STI.
This could possibly be avoided by taking STI as a const reference in
emitCodeAlignment.

Differential Revision: https://reviews.llvm.org/D109425
2021-09-08 13:28:24 +01:00
Sjoerd Meijer 88a2031207 [FuncSpec] Fix typo in option description. NFC. 2021-09-08 12:58:46 +01:00
David Green d8d24c64fe [DAG] Fix GT -> GE condition when creating SetCC
79845ed6df folded some setcc(ashr) conditions to setcc, but got
the condition for NE incorrect, using GT where it should be using GE.
2021-09-08 12:41:51 +01:00
Evgeny Leviant 93b09a2a5d [LiveDebugValues] Handle spills of indirect debug values correctly
When handling register spill for indirect debug value LiveDebugValues pass doesn't add
DW_OP_deref operator which may in some cases cause debugger to return value address, instead
of value while machine register holding that address is spilled.

Differential revision: https://reviews.llvm.org/D109142
2021-09-08 14:06:08 +03:00
Fraser Cormack 7fb66d4035 [MemCpyOpt] Fix a variety of scalable-type crashes
This patch fixes a variety of crashes resulting from the `MemCpyOptPass`
casting `TypeSize` to a constant integer, whether implicitly or
explicitly.

Since the `MemsetRanges` requires a constant size to work, all but one
of the fixes in this patch simply involve skipping the various
optimizations for scalable types as cleanly as possible.

The optimization of `byval` parameters, however, has been updated to
work on scalable types in theory. In practice, this optimization is only
valid when the length of the `memcpy` is known to be larger than the
scalable type size, which is currently never the case. This could
perhaps be done in the future using the `vscale_range` attribute.

Some implicit casts have been left as they were, under the knowledge
they are only called on aggregate types. These should never be
scalably-sized.

Reviewed By: nikic, tra

Differential Revision: https://reviews.llvm.org/D109329
2021-09-08 11:21:36 +01:00
Fraser Cormack 2c5568a6a9 [LegalizeTypes][VP] Add promotion support for binary VP ops
This patch extends the preliminary support for vector-predicated (VP)
operation legalization to include promotion of illegal integer vector
types.

Integer promotion of binary VP operations is relatively simple and
piggy-backs on the non-VP logic, but passing the two extra mask and VP
operands through to the promoted operation.

Tests have been added to the RISC-V target to cover the basic scenarios
for integer promotion for both fixed- and scalable-vector types.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D108288
2021-09-08 10:22:57 +01:00
Cullen Rhodes 89786c2b99 [AArch64][SME] Fix imm bug in mov vector to tile aliases
Also fixes a warning mentioned in D109359.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D109363
2021-09-08 07:42:16 +00:00
Sander de Smalen 981f7d563a [AArch64] Implement extract_subvector for predicates.
This patch implements extract_subvector for predicate types when
the input type is more than twice the size of the subvector that
is being extracted.

Reviewed By: CarolineConcatto

Differential Revision: https://reviews.llvm.org/D109314
2021-09-08 08:18:34 +01:00
Max Kazantsev 29d054bf12 [SimplifyCFG] Preserve knowledge about guarding condition by adding assume
This improvement adds "assume" after removal of branch basing on UB in successor block.

Consider the following example:

```
pred:
  x = ...
  cond = x > 10
  br cond, bb, other.succ

bb:
  phi [nullptr, pred], ... // other possible preds
  load(phi) // UB if we came from pred

other.succ:
  // here we know that x <= 10, but this knowledge is lost
  // after the branch is turned to unconditional unless we
  // preserve it with assume.
```

If we remove the branch basing on knowledge about UB in a successor block,
then the fact that x <= 10 is other.succ might be lost if this condition is
not inferrable from any dominating condition. To preserve this knowledge, we
can add assume intrinsic with (possibly inverted) branch condition.

Patch by Dmitry Bakunevich!

Differential Revision: https://reviews.llvm.org/D109054
Reviewed By: lebedev.ri
2021-09-08 14:05:17 +07:00
Justin Latimer b0d4d969e2 [AVR] Add support for the tinyAVR 0-series and tinyAVR 1-series
Reviewed By: Dylan McKay, Ben Shi

Differential Revision: https://reviews.llvm.org/D103136
2021-09-08 02:35:26 +00:00
Ben Shi f0460fa4eb [AArch64] Improve target hook function to decide folding (mul (add x, c1), c2)
Prevent the folding if it leads to worse code.

Reviewed By: dmgreen, kda

Differential Revision: https://reviews.llvm.org/D108871
2021-09-08 01:51:26 +00:00
Wang, Pengfei 9d7d34c769 [X86][MS] Fix the aligement mismatch of vector variable arguments on Win32
The alignment of vector variable arguments in callee side is 4, which is
aligned with MSVC. But the caller aligns them to the size of vector
arguments. It results in run fails. This patch fixes this problem by
trimming it to 4 bytes for variable arguments on Win32.

Fixed vector arguments are passed by pointer on Win32. So they don't have
the problem.

I don't find a doc in MSDN for this calling conversion, so I did several
experiments here: https://godbolt.org/z/n1zn1Gx1z

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D108887
2021-09-08 09:26:44 +08:00
Philip Reames 6cdca906c7 [SCEV] Use no-self-wrap flags infered from exit structure to compute trip count
The basic problem being solved is that we largely give up when encountering a trip count involving an IV which is not an addrec. We will fall back to the brute force constant eval, but that doesn't have the information about the fact that we can't cycle back through the same set of values.

There's a high level design question of whether this is the right place to handle this, and if not, where that place is. The major alternative here would be to return a conservative upper bound, and then rely on two invocations of indvars to add the facts to the narrow IV, and then reconstruct SCEV. (I have not implemented the alternative and am not 100% sure this would work out.) That's arguably more in line with existing code, but I find this substantially easier to reason about.  During review, no one expressed a strong opinion, so we went with this one.

Differential Revision: D108651
2021-09-07 17:00:02 -07:00
Heejin Ahn a1d522939c [WebAssembly] Error out on indirect uses of setjmp
Both Wasm & Emscripten SjLj handling has a restriction that `setjmp`
cannot be called indirectly. I thought we have been erroring out on
indirect uses of `setjmp`, but some recent CL disrupted the logic and
we are not erroring out anymore.

We currently
1. Collect functions that contain `setjmp` calls in `SetjmpUsers`. This
   only counts direct calls:
   8f77dc459e/llvm/lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp (L869-L878)
2. Run `runSjLjOnFunction` only on those `SetjmpUsers`. Within
   `runSjLjOnFunction`, if we see an indirect use of `setjmp`, we error
   out:
   8f77dc459e/llvm/lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp (L1218-L1221)

So if there are only indirect setjmp calls within the module,
`SetjmpUsers` will be empty, and `runSjLjOnFunction` is not even entered
once. And the indirect `setjmp` call will error out at link time. So in
this CL we check for the indirect uses of `setjmp` upfront before we
enter `runSjLjOnFunction`.

Also this currently errors out on `invoke @setjmp`, which can only occur
when using Wasm EH + Wasm SjLj within a function. We recently added Wasm
SjLj support but we don't support using Wasm EH + Wasm SjLj in the same
function yet. We plan to add this support very soon, so I don't think
it's worth creating another test file just for this. (This is an error
test so it needs its own file)

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D109375
2021-09-07 15:52:58 -07:00
Arthur Eubanks 39e2e3bddb [NFC][C API] Make LLVMSetInstrParamAlignment's index param type LLVMAttributeIndex
It's the same as unsigned, but clearer in intent.
2021-09-07 15:13:45 -07:00
Rainer Orth 08ba87fa4b [Support] Implement getMainExecutable on Solaris
Many `flang` tests currently `FAIL` on Solaris because the module files
aren't found.  I could trace this to `sys::fs::getMainExecutable` not being
implemented.

This patch does this and fixes all affected `flang` tests.

Tested on `amd64-pc-solaris2.11`.

Differential Revision: https://reviews.llvm.org/D109374
2021-09-07 22:56:10 +02:00
Philip Reames 9659069978 [SCEV] Further clarify comments regarding UB and zero stride
Follow on to D109029. I realized we had no mention of mustprogrress in the comment (as it prexisted mustprogress in the codebase). In the process of adding it, I tweaked the preconditions into something I think is more clear. Note that mustprogress is checked in the code.

Differential Revision: https://reviews.llvm.org/D109091
2021-09-07 13:53:56 -07:00
Sanjay Patel a3c1669b17 [InstCombine] fold icmp equality with 'or' mask ops
This could go either direction since the instruction
count is the same either way, but there are a few
reasons to prefer this:
1. We already do the related transform with 'and'
   (see just above the new code).
2. We try (too hard) to compensate for not having this
   and possibly other folds in transformZExtICmp(),
   and that leads to bugs like https://llvm.org/PR51762 .
3. Codegen looks better across a variety of targets.

https://alive2.llvm.org/ce/z/uEgn4P
2021-09-07 16:34:00 -04:00
Irina Dobrescu 7023cefe61 [AArch64][Global ISel] Add sext/zext of vector extract improvements
This patch adds improvements for sext/zext of a vector extract in Global
ISel.

For example, this piece of code:

define i64 @si64(<4 x i32> %0, i32 %1) {
  %3 = extractelement <4 x i32> %0, i64 1
  %s = sext i32 %3 to i64
  ret i64 %s
}

Used to have this lowering:
si64:
  mov s0, v0.s[1]
  fmov w8, s0
  sxtw x0, w8
  ret

Whereas this patch makes it lower to this:
si64:
  smov x0, v0.h[0]
  ret

Differential Revision: https://reviews.llvm.org/D108137
2021-09-07 21:17:51 +01:00
Arthur Eubanks 4b05341681 Don't check if the result of hasAttrSomewhere is non-zero in CallBase::getReturnedArgOperand()
Index is 0 when the return value has the returned attribute. But the
return value cannot have the returned attribute, so the check is
pointless.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D109334
2021-09-07 12:05:56 -07:00
Elliot Saba ae8507b0df [X86] Don't clobber EBX in stackprobes
On X86, the stackprobe emission code chooses the `R11D` register, which
is illegal on i686.  This ends up wrapping around to `EBX`, which does
not get properly callee-saved within the stack probing prologue,
clobbering the register for the callers.

We fix this by explicitly using `EAX` as the stack probe register.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D109203
2021-09-07 15:00:44 -04:00
Nikita Popov f5832eaaad [UseListOrder] Fix use list order for function operands
Functions can have a personality function, as well as prefix and
prologue data as additional operands. Unused operands are assigned
a dummy value of i1* null. This patch addresses multiple issues in
use-list order preservation for these:

 * Fix verify-uselistorder to also enumerate the dummy values.
   This means that now use-list order values of these values are
   shuffled even if there is no other mention of i1* null in the
   module. This results in failures of Assembler/call-arg-is-callee.ll,
   Assembler/opaque-ptr.ll and Bitcode/use-list-order2.ll.
 * The use-list order prediction in ValueEnumerator does not take
   into account the fact that a global may use a value more than
   once and leaves uses in the same global effectively unordered.
   We should be comparing the operand number here, as we do for
   the more general case.
 * While we enumerate all operands of a function together (which
   seems sensible to me), the bitcode reader would first resolve
   prefix data for all function, then prologue data for all
   functions, then personality functions for all functions. Change
   this to resolve all operands for a given function together
   instead.

Differential Revision: https://reviews.llvm.org/D109282
2021-09-07 20:59:12 +02:00
Arthur Eubanks 7f54009a1f Add missing overloads for Function::addRetAttr(s) 2021-09-07 11:52:22 -07:00
Nikita Popov 58db5f6e95 [ConstFold] Support opaque pointers in constexpr GEPs
Support opaque pointers in SymbolicallyEvaluateGEP() by using the
value type of a GlobalValue base or falling back to i8 if there
isn't one. We don't unconditionally generate i8 GEPs here because
that would lose inrange attribues, and because some optimizations
on globals currently rely on GEP types (e.g. the globals SROA
mentioned in the comment).

Differential Revision: https://reviews.llvm.org/D109297
2021-09-07 20:50:29 +02:00
Andy Kaylor 34528c32d2 Copy Elementtype Attribute to IR at Link step
Copying IR during linking causes a type mismatch due to the field being missing in IRMover/Valuemapper. Adds the full range of typed attributes including elementtype attribute in the copy functions.

Patch by Chenyang Liu

Differential Revision: https://reviews.llvm.org/D108796
2021-09-07 11:41:43 -07:00
Arthur Eubanks b81fc14f2d [NFC][InstCombine] Make check for sret in a vararg function clearer
We're trying to get the parameter index of sret and see if it's part of
a function's varargs.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D109335
2021-09-07 11:19:27 -07:00
Roman Lebedev 35fa7b8ad8
Reland "[InstCombine] Recognize `((x * y) s/ x) !=/== y` as an signed multiplication overflow check (PR48769)"
This reverts commit 91f7a4fff7,
relanding commit 13ec913bdf.

The original commit was reverted because of (essentially)
https://bugs.llvm.org/show_bug.cgi?id=35922
which has now been addressed by d0eeb64be5.
2021-09-07 21:03:52 +03:00
Nick Desaulniers d0eeb64be5 [X86ISelLowering] avoid emitting libcalls to __mulodi4()
Similar to D108842, D108844, and D108926.

__has_builtin(builtin_mul_overflow) returns true for 32b x86 targets,
but Clang is deferring to compiler RT when encountering long long types.
This breaks ARCH=i386 + CONFIG_BLK_DEV_NBD=y builds of the Linux kernel
that are using builtin_mul_overflow with these types for these targets.

If the semantics of __has_builtin mean "the compiler resolves these,
always" then we shouldn't conditionally emit a libcall.

This will still need to be worked around in the Linux kernel in order to
continue to support these builds of the Linux kernel for this
target with older releases of clang.

Link: https://bugs.llvm.org/show_bug.cgi?id=28629
Link: https://bugs.llvm.org/show_bug.cgi?id=35922
Link: https://github.com/ClangBuiltLinux/linux/issues/1438

Reviewed By: lebedev.ri, RKSimon

Differential Revision: https://reviews.llvm.org/D108928
2021-09-07 10:44:54 -07:00
Simon Pilgrim 9eda472112 [X86] X86InstrAVX512.td - remove unused template parameters. NFC.
Identified in D109359
2021-09-07 17:38:20 +01:00
Kazu Hirata 5648f7170e [Analysis, Target, Transforms] Construct SmallVector with iterator ranges (NFC) 2021-09-07 09:19:33 -07:00
Kazu Hirata 5c6338de16 [RISCV] Fix "set but not used" warnings 2021-09-07 09:19:31 -07:00
Dávid Bolvanský 3b5f318f5d [InstCombine] ror/rol(X, RotAmt) == C --> X == rol/ror(C, RotAmt) (PR51567)
```
----------------------------------------
define i1 @src(i32 %0) {
%1:
  %2 = fshl i32 %0, i32 %0, i32 25
  %3 = icmp eq i32 %2, 5
  ret i1 %3
}
=>
define i1 @tgt(i32 %0) {
%1:
  %2 = icmp eq i32 %0, 640
  ret i1 %2
}
Transformation seems to be correct!
```

https://alive2.llvm.org/ce/z/GdY8Jm

Solves PR51567

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D109283
2021-09-07 18:04:58 +02:00
Andrew Litteken 81d3ac0cf2 [IROutliner] Adding outlining for single entry/single exit multiblock regions
Using the similarity found from the IRSimilarity Identifier, we take regions with structural similarity, and deduplicate them into a separate function. The Code Extractor is able to provide most of this functionality.

For simplicity, we start by only outlining regions with a single entry and single exit branch, this reduces the complexity in handling phi nodes outside the region, and handling many sets of outputs for each of the different exit blocks.

Reviewer: paquette

Differential Revision: https://reviews.llvm.org/D106990
2021-09-07 08:51:54 -07:00
Victor Huang 4a226529e2 [PowerPC] Fixed the crash due to early if conversion with fixed CR fields
This patch adds a fix to do early if conversion to select when
conditional branch not using physical register to prevent the crash when
expanding ISEL instruction.

Reviewed By: lei, kamaub, PowerPC

Differential revision: https://reviews.llvm.org/D108302
2021-09-07 10:51:03 -05:00
Simon Pilgrim f8d2cd1428 [X86] Add missing domain to avx512_ord_cmp_sae comis sae patterns
It doesn't appear to be possible to generate this from tests atm, but it matches what we do in sse12_ord_cmp

Fixes unused template arg identified in D109359
2021-09-07 16:20:21 +01:00
Jinsong Ji 042a6564d3 [PowerPC] Guard XSRSP in P8 for FastISel
This is exposed by enabling FastIsel on 64bit AIX.
We are generating XSRSP regardless of the arch,
which may be wrong when -mcpu=pwr7.

The fix is to guard the generation in P8 only.

Reviewed By: qiucf

Differential Revision: https://reviews.llvm.org/D109365
2021-09-07 15:17:51 +00:00
Sander de Smalen bd576e5ac0 [AArch64][SVE] Improve extract_subvector for predicates.
Using PUNPKLO/HI instead of ZIP1/ZIP2, because that avoids
having to generate a predicate with all lanes inactive (PFALSE).

Reviewed By: CarolineConcatto

Differential Revision: https://reviews.llvm.org/D109312
2021-09-07 15:49:29 +01:00
Peter Smith e63455d5e0 [MC] Use local MCSubtargetInfo in writeNops
On some architectures such as Arm and X86 the encoding for a nop may
change depending on the subtarget in operation at the time of
encoding. This change replaces the per module MCSubtargetInfo retained
by the targets AsmBackend in favour of passing through the local
MCSubtargetInfo in operation at the time.

On Arm using the architectural NOP instruction can have a performance
benefit on some implementations.

For Arm I've deleted the copy of the AsmBackend's MCSubtargetInfo to
limit the chances of this causing problems in the future. I've not
done this for other targets such as X86 as there is more frequent use
of the MCSubtargetInfo and it looks to be for stable properties that
we would not expect to vary per function.

This change required threading STI through MCNopsFragment and
MCBoundaryAlignFragment.

I've attempted to take into account the in tree experimental backends.

Differential Revision: https://reviews.llvm.org/D45962
2021-09-07 15:46:19 +01:00
Peter Smith 5e71839f77 [MC] Add MCSubtargetInfo to MCAlignFragment
In preparation for passing the MCSubtargetInfo (STI) through to writeNops
so that it can use the STI in operation at the time, we need to record the
STI in operation when a MCAlignFragment may write nops as padding. The
STI is currently unused, a further patch will pass it through to
writeNops.

There are many places that can create an MCAlignFragment, in most cases
we can find out the STI in operation at the time. In a few places this
isn't possible as we are in initialisation or finalisation, or are
emitting constant pools. When possible I've tried to find the most
appropriate existing fragment to obtain the STI from, when none is
available use the per module STI.

For constant pools we don't actually need to use EmitCodeAlign as the
constant pools are data anyway so falling through into it via an
executable NOP is no better than falling through into data padding.

This is a prerequisite for D45962 which uses the STI to emit the
appropriate NOP for the STI. Which can differ per fragment.

Note that involves an interface change to InitSections. It is now
called initSections and requires a SubtargetInfo as a parameter.

Differential Revision: https://reviews.llvm.org/D45961
2021-09-07 15:46:19 +01:00
Michael Liao 640beb38e7 [amdgpu] Enable selection of `s_cselect_b64`.
Differential Revision: https://reviews.llvm.org/D109159
2021-09-07 10:45:07 -04:00
Mirko Brkusanin 6c4b634da6 [AMDGPU][GlobalISel] Legalize G_MUL for non-standard types
Legalizing G_MUL for non-standard types (like i33) generated an error. Putting
minScalar and maxScalar instead of clampScalar. Also using new rule, instead
of widening to the next power of 2, widen to the next multiple of the passed
argument (32 in this case), so instead of widening i65 to i128, we widen it to
i96.

Patch by: Mateja Marjanovic

Differential Revision: https://reviews.llvm.org/D109228
2021-09-07 16:33:24 +02:00
Mirko Brkusanin 5263bf583a [AMDGPU][GlobalISel] Legalization of G_ROTL and G_ROTR
Add implementation for the legalization of G_ROTL and G_ROTR machine
instructions. They are very similar to funnel shift instructions, the only
difference is funnel shifts have 3 operands, whereas rotate instructions have
two operands, the first being the register that is being rotated and the second
being the number of shifts. The legalization of G_ROTL/G_ROTR is just lowering
them into funnel shift instructions if they are legal.

Patch by: Mateja Marjanovic

Differential Revision: https://reviews.llvm.org/D105347
2021-09-07 16:33:24 +02:00
Simon Pilgrim 0d48ee2774 [X86] X86InstrSSE.td - remove unused template parameters. NFC.
Identified in D109359
2021-09-07 15:13:05 +01:00
Simon Pilgrim b50a60c234 [X86] X86InstrVecCompiler.td - remove unused template parameters. NFC.
Identified in D109359
2021-09-07 14:46:08 +01:00
Simon Pilgrim fb38795062 [X86] X86InstrFMA.td - remove unused template parameters. NFC.
Identified in D109359
2021-09-07 14:46:07 +01:00
Anton Afanasyev d1f9b21677 [AggressiveInstCombine] Add `AssumptionCache` to aggressive instcombine
Add support for @llvm.assume() to TruncInstCombine allowing
optimizations based on these intrinsics while computing known bits.
2021-09-07 16:45:00 +03:00
Anton Afanasyev 8c0a1940c1 [AggresiveInstCombine] Add wrapper calls for `KnownBits` computing
Precommit before `AssumptionCache` adding: reviews.llvm.org/D109141

Differential Revision: https://reviews.llvm.org/D109288
2021-09-07 16:45:00 +03:00
Sander de Smalen 448d47f743 [AArch64][SVE] Implement all-inactive predicate with PFALSE.
Instead of using a WHILE XZR, XZR instruction, just emit a PFALSE.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D109311
2021-09-07 14:29:02 +01:00
Simon Pilgrim 0a07ae6ebf [KnownBits] Add support for X*X self-multiplication
Add KnownBits handling and unit tests for X*X self-multiplication cases which guarantee that bit1 of their results will be zero - see PR48683.

https://alive2.llvm.org/ce/z/NN_eaR

The next step will be to add suitable test coverage so this can be enabled in ValueTracking/DAG/GlobalISel - currently only a single Analysis/ScalarEvolution test is affected.

Differential Revision: https://reviews.llvm.org/D108992
2021-09-07 11:43:45 +01:00
Mirko Brkusanin 36527cbe02 [AMDGPU][GlobalISel] Legalize memcpy family of intrinsics
Legalize G_MEMCPY, G_MEMMOVE, G_MEMSET and G_MEMCPY_INLINE.

Corresponding intrinsics are replaced by a loop that uses loads/stores in
AMDGPULowerIntrinsics pass unless their length is a constant lower then
MemIntrinsicExpandSizeThresholdOpt (default 1024). Any G_MEM* instruction that
reaches legalizer should have a const length argument and should be expanded
into appropriate number of loads + stores.

Differential Revision: https://reviews.llvm.org/D108357
2021-09-07 12:24:07 +02:00
Fraser Cormack a823bdf3ab [RISCV][VP] Custom lower VP_STORE and VP_LOAD
This patch adds support for the vector-predicated `VP_STORE` and
`VP_LOAD` nodes. We do this in the same way we lower `MSTORE` and
`MLOAD`: to regular load/store instructions via intrinsics.

One necessary change was made to `SelectionDAGLegalize` so that
`VP_STORE` nodes' operation actions are taken from the stored "value"
operands, in the same vein as `STORE` or `MSTORE`.

Reviewed By: craig.topper, rogfer01

Differential Revision: https://reviews.llvm.org/D108999
2021-09-07 10:53:25 +01:00
Fraser Cormack f4dee8cb82 [RISCV][VP] Custom lower VP_SCATTER and VP_GATHER
This patch adds support for the `VP_SCATTER` and `VP_GATHER` nodes by
lowering them to RVV's `vsox`/`vlux` instructions, respectively. This
process is almost identical to the existing `MSCATTER`/`MGATHER` support.

One extra change was made to `SelectionDAGLegalize` so that
`VP_SCATTER`'s operation action is derived from its stored "value"
operand rather than its return type (which is always the chain).

Reviewed By: craig.topper, rogfer01

Differential Revision: https://reviews.llvm.org/D108987
2021-09-07 10:43:07 +01:00
Andrew Wei da9ed3dc71 [AArch64] Avoid adding duplicate implicit operands when expanding pseudo insts.
When expanding pseudo insts, in order to create a new machine instr, we use BuildMI,
which will add implicit operands by default. And transferImpOps will also copy implicit
operands from old ones. Finally, duplicate implicit operands are added to the same inst.
Sometimes this can cause correctness issues. Like below inst,
    renamable $w18 = nsw SUBSWrr renamable $w30, renamable $w14, implicit-def dead $nzcv
After expanding, it will become
    $w18 = SUBSWrs renamable $w13, renamable $w14, 0, implicit-def $nzcv, implicit-def dead $nzcv
A redundant implicit-def $nzcv is added, but the dead flag is missing.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D109069
2021-09-07 17:11:58 +08:00
luxufan ffcaa80f7e [RuntimeDyld] Don't use bitwise operation on SymbolRef::Type
Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D109292
2021-09-07 16:58:35 +08:00
Ben Shi 63ca9371c7 [ARM] Implement target hook function to decide folding (mul (add x, c1), c2)
Prevent the folding in DAGCombine if it leads to worse code.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D109124
2021-09-07 15:42:43 +08:00
Craig Topper da3ef8b756 [X86] Handle inverted inputs when matching VPTERNLOG from 2 binary ops.
This is a more general version of D109273. Though it doesn't
peek through bitcasts or rearange broadcasts.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D109295
2021-09-06 17:44:52 -07:00
Fangrui Song 76529b4468 [X86] Simplify condition guarding emitCalleeSavedFrameMoves. NFC 2021-09-06 15:54:02 -07:00
Fangrui Song 4f1e410a1b [X86] Simplify two hasFP(F). NFC 2021-09-06 15:47:40 -07:00
Nikita Popov 8d54c8a0c3 [SCEV] Fix applyLoopGuards() with range check idiom (PR51760)
Due to a typo, this replaced %x with umax(C1, umin(C2, %x + C3))
rather than umax(C1, umin(C2, %x)). This didn't make a difference
for the existing tests, because the result is only used for range
calculation, and %x will usually have an unknown starting range,
and the additional offset keeps it unknown. However, if %x already
has a known range, we may compute a result range that is too
small.
2021-09-06 22:22:41 +02:00
Sanjay Patel e1e4bf174b [DAGCombine] Prevent the transform of combine for multi-use operand
The test is based on a miscompile example in:
https://llvm.org/PR51321

Differential Revision: https://reviews.llvm.org/D107692
2021-09-06 15:30:32 -04:00
Andrew Litteken bd4b1b5f6d [IRSim] Adding support for recognizing branch similarity
The current IRSimilarityIdentifier does not try to find similarity across blocks, this patch provides a mechanism to compare two branches against one another, to find similarity across basic blocks, rather than just within them.

This adds a step in the similarity identification process that labels all of the basic blocks so that we can identify the relative branching locations. Within an IRSimilarityCandidate we use these relative locations to determine whether if the branching to other relative locations in the same region is the same between branches. If they are, we consider them similar.

We do not consider the relative location of the branch if the target branch is outside of the region. In this case, both branches must exit to a location outside the region, but the exact relative location does not matter.

Reviewers: paquette, yroux

Differential Revision: https://reviews.llvm.org/D106989
2021-09-06 11:55:38 -07:00
Kazu Hirata 3322354bfc [Support] Qualify auto (NFC)
Identified with readability-qualified-auto.
2021-09-06 09:10:07 -07:00
Jonas Paulsson 118997d8e9 [SelectionDAGBuilder] Bugfix in visitInlineAsm()
In case of a virtual register tied to a phys-def, the register class needs to
be computed. Make sure that this works generally also with fast regalloc by
using TLI.getRegClassFor() whenever possible, and make only the case of
'Untyped' use getMinimalPhysRegClass().

Fixes https://bugs.llvm.org/show_bug.cgi?id=51699.

Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D109291
2021-09-06 17:46:31 +02:00
Sanjay Patel 0d83e72034 [InstCombine] fix infinite loop from shift transform
I'm not sure if there is a better way or another bug
still here, but this is enough to avoid the loop from:
https://llvm.org/PR51657

The test requires multiple blocks and datalayout to
trigger the problem path.
2021-09-06 11:13:39 -04:00
Sanjay Patel c85f450619 [InstCombine] refactor to reduce indent; NFC
This transform should be updated to use better
variable names and code comments. It could
also create the shift-of-shift directly instead
of relying on another combine for that.
2021-09-06 11:13:39 -04:00
Sanjay Patel fbb78668f2 [InstCombine] fix one-use condition for shift transform
This transform is written in a confusing style,
and I suspect it is at fault for a more serious
bug noted in PR51567.

But it's been around forever, so I'm making the
minimal change to fix another bug - it could
increase instructions because it was not checking
uses.
2021-09-06 11:13:39 -04:00
Sanjay Patel 982a15cb3f [InstCombine] early exit to reduce indentation; NFC 2021-09-06 11:13:38 -04:00
Victor Campos 79f9c79aaf [AArch64][MC] Merge FeaturePMU into FeaturePerfMon
FeaturePMU was created in AArch64 to accommodate one missing system
register, PMMIR_EL1, in commit ffcd7698ae.

However, the Performance Monitors extension already had a target
feature, which is called FeaturePerfMon. Therefore, FeaturePMU is
redundant.

This patch removes FeaturePMU and merges its contents into
FeaturePerfMon.

Reviewed By: dnsampaio

Differential Revision: https://reviews.llvm.org/D109246
2021-09-06 14:56:49 +01:00
David Truby b297531ece [AArch64][sve] Prevent incorrect function call on fixed width vector
The isEssentiallyExtractHighSubvector function currently calls
getVectorNumElements on a type that in specific cases might be scalable.
Since this function only has correct behaviour at the moment on scalable
types anyway, the function can just return false when given a fixed type.

Differential Revision: https://reviews.llvm.org/D109163
2021-09-06 14:25:03 +01:00
Sander de Smalen 96f6785bc9 [VectorUtils] Teach findScalarElement to return splat value.
If the vector is a splat of some scalar value, findScalarElement()
can simply return the scalar value if it knows the requested lane
is in the vector.

This is only needed for scalable vectors, because the InsertElement/ShuffleVector
case is already handled explicitly for the fixed-width case.

This helps to recognize an InstCombine fold like:
  extractelt(bitcast(splat(%v))) -> bitcast(%v)

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D107254
2021-09-06 10:56:06 +01:00
Tianqing Wang 12fa608af4 [X86] Add CRC32 feature.
d8faf03807 implemented general-regs-only for X86 by disabling all features
with vector instructions. But the CRC32 instruction in SSE4.2 ISA, which uses
only GPRs, also becomes unavailable. This patch adds a CRC32 feature for this
instruction and allows it to be used with general-regs-only.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D105462
2021-09-06 17:24:30 +08:00
Moritz Sichert a0a5964499 [RuntimeDyld] Implemented relocation of TLS symbols in ELF
Differential Revision: https://reviews.llvm.org/D105466
2021-09-06 10:27:43 +02:00
Moritz Sichert f687378603 [RuntimeDyld] Implemented relocation for ELF::R_X86_64_GOTPC32
Differential Revision: https://reviews.llvm.org/D95512
2021-09-06 10:26:37 +02:00
Fangrui Song 0e03450ae4 [AArch64] Remove an uneeded !NeedsWinCFI check. NFC 2021-09-05 21:02:56 -07:00
guopeilin 5f48c144c5 [AArch64][GlobalISel] Use ZExtValue for zext(xor) when invert tb(n)z
Currently, we use SExtValue to decide whether to invert tbz or tbnz.
However, for the case zext (xor x, c), we should use ZExt rather
than SExt otherwise we will generate totally opposite branches.

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D108755
2021-09-06 11:12:07 +08:00
David Green 1b83aaaefa [DAG] Remove oneuse check in select_cc setgt X, -1, C, ~C fold
This appears to produce better code, even if the condition may need to
be replicated.
2021-09-05 16:18:31 +01:00
Simon Pilgrim f114ef3731 [CostModel][X86] Add generic costs for vXi32 MUL -> v2Xi16 PMADDDW folds
Based off the improved fold in D108522

This should eventually allow us to replace the SLM only cost patterns with generic versions.
2021-09-05 16:08:11 +01:00
David Green 8523fb96a6 [DAG] Fold select_cc setgt X, -1, C, ~C -> xor (ashr X, BW-1), C
Given a select_cc producing a constant and a invertion of the constant
for a comparison more than zero, we can produce an xor with ashr
instead, which produces smaller code. The ashr either sets all bits or
clear all bits depending on if the value is negative. This is then xor'd
with the constant to optionally negate the value.
https://alive2.llvm.org/ce/z/DTFaBZ

This includes a OneUseCheck on the Cmp, which seems to make thinks a
little worse and will be removed in a followup.

Differential Revision: https://reviews.llvm.org/D109149
2021-09-05 16:04:01 +01:00
David Green 79845ed6df [DAG] Fold setcc eq with ashr to compare to zero.
Pulled out of D109149, this folds set_cc seteq (ashr X, BW-1), -1 ->
set_cc setlt X, 0 to prevent some regressions later on when folding
select_cc setgt X, -1, C, ~C -> xor (ashr X, BW-1), C

Differential Revision: https://reviews.llvm.org/D109214
2021-09-05 14:06:47 +01:00
Dávid Bolvanský 9c476172b9 [InstCombine] stpcpy(d,s) -> strcpy(d,s) if the result is not used 2021-09-05 12:12:07 +02:00
Michael Kruse 650bbc5620 [OpenMP][OpenMPIRBuilder] Implement loop unrolling.
Recommit of 707ce34b06. Don't introduce a
dependency to the LLVMPasses component, instead register the required
passes individually.

Add methods for loop unrolling to the OpenMPIRBuilder class and use them in Clang if `-fopenmp-enable-irbuilder` is enabled. The unrolling methods are:

 * `unrollLoopFull`
 * `unrollLoopPartial`
 * `unrollLoopHeuristic`

`unrollLoopPartial` and `unrollLoopHeuristic` can use compiler heuristics to automatically determine the unroll factor. If possible, that is if no CanonicalLoopInfo is required to pass to another method, metadata for LLVM's LoopUnrollPass is added. Otherwise the unroll factor is determined using the same heurstics as user by LoopUnrollPass. Not requiring a CanonicalLoopInfo, especially with `unrollLoopHeuristic` allows greater flexibility.

With full unrolling and partial unrolling with known unroll factor, instead of duplicating instructions by the OpenMPIRBuilder, the full unroll is still delegated to the LoopUnrollPass. In case of partial unrolling the loop is first tiled using the existing `tileLoops` methods, then the inner loop fully unrolled using the same mechanism.

Reviewed By: jdoerfert, kiranchandramohan

Differential Revision: https://reviews.llvm.org/D107764
2021-09-04 19:18:58 -05:00
Fangrui Song e03c8d309a [AsmPrinter] Remove unneeded MCSubtargetInfo temporary after D14346. NFC
The temporary object was used as a workaround when the target parser may
change STI. D14346 made the MCSubtargetInfo argument to
createMCAsmParser const, so we no longer need the temporary object.
2021-09-04 10:50:10 -07:00
Dávid Bolvanský 3a696f6092 [InstCombine] rotate(X,Z) eq/ne rotate(Y,Z) ---> X eq/ne Y (PR51565)
```

----------------------------------------
define i1 @src(i8 %x, i8 %y, i8 %z) {
%0:
  %f = fshl i8 %x, i8 %x, i8 %z
  %f2 = fshl i8 %y, i8 %y, i8 %z
  %r = icmp eq i8 %f, %f2
  ret i1 %r
}
=>
define i1 @tgt(i8 %x, i8 %y, i8 %z) {
%0:
  %r = icmp eq i8 %x, %y
  ret i1 %r
}
Transformation seems to be correct!

```

https://alive2.llvm.org/ce/z/qAZp8f

Solves PR51565

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D109271
2021-09-04 18:58:44 +02:00
Bjorn Pettersson 0f0344dd1e [SimpleLoopUnswitch] Inform pass manager when child loops are deleted
As part of the nontrivial unswitching we could end up removing child
loops. This patch add a notification to the pass manager when
that happens (using the markLoopAsDeleted callback).

Without this there could be stale LoopAccessAnalysis results cached
in the analysis manager. Those analysis results are cached based on
a Loop* as key. Since the BumpPtrAllocator used to allocate
Loop objects could be resetted between different runs of for
example the loop-distribute pass (running on different functions),
a new Loop object could be created using the same Loop pointer.
And then when requiring the LoopAccessAnalysis for the loop we
got the stale (corrupt) result from the destroyed loop.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D109257
2021-09-04 17:54:39 +02:00
Shivam Gupta 5449d2da65 [NFC] Run clang-format on llvm/lib/Trget/AVR/
The current inconsistency confuse contributors which coding guidlines to follow.
It would be better to have it consistent using clang-format tool.

Reviewed By: mhjacobson

Differential Revision: https://reviews.llvm.org/D109270
2021-09-04 20:05:15 +05:30
Simon Pilgrim cb8d96e72f Fix Wdocumentation unknown parameter warning. NFCI. 2021-09-04 15:06:53 +01:00
Simon Pilgrim 2005ae15a6 [X86][SLM] WriteVecIMul instructions only take 1uop (REAPPLIED)
The xmm variant have half the throughput (and +1cy latency) of the mmx variants, but are still 1uop.

I still need to do more thorough testing of SLM on test-suite before fixing the obvious bad numbers for WritePMULLD.

But this helps the D103695 helper script get to more accurate numbers for vXi32 multiplies of extended operands (i.e. we can use PMADDWD, PMULLW/PMULHW etc). Matches what Intel AoM / Agner / llvm-exegesis reports.
2021-09-04 15:03:56 +01:00
Simon Pilgrim ac51d69208 Revert rG994da657076900f5ad7fe593c3b5e5f89ab3d53d "[X86][SLM] WriteVecIMul instructions only take 1uop"
This changed some codegen tests that I forgot about in my rebase, I'll recommit shortly with a fix.
2021-09-04 13:39:10 +01:00
Simon Pilgrim 994da65707 [X86][SLM] WriteVecIMul instructions only take 1uop
The xmm variant have half the throughput (and +1cy latency) of the mmx variants, but are still 1uop.

I still need to do more thorough testing of SLM on test-suite before fixing the obvious bad numbers for WritePMULLD.

But this helps the D103695 helper script get to more accurate numbers for vXi32 multiplies of extended operands (i.e. we can use PMADDWD, PMULLW/PMULHW etc). Matches what Intel AoM / Agner / llvm-exegesis reports.
2021-09-04 13:21:34 +01:00
Simon Pilgrim c6371020a8 [X86][SLM] RMW instructions don't require an extra uop
For RMW instructions, the load and store hold the MEC for an extra cycle, but within the same single uop. This is alluded to in the Intel AOM:

"The MEC also owns the MEC RSV, which is responsible for scheduling of all loads and stores. Load and
store instructions go through addresses generation phase in program order to avoid on-the-fly memory
ordering later in the pipeline. Therefore, an unknown address will stall younger memory instructions."

Noticed while trying to get a cheap SLM test box up and running with llvm-exegesis - RMW arithmetic is always 1uop - and matches what Agner / InstLatX64 report as well.
2021-09-04 13:21:34 +01:00
Simon Pilgrim da965a77d5 [X86][SLM] Fix MUL uops, latency and throughput
These were all set to the same best case mul i32 values (which seems to be the only version of MUL that SLM actually performs well with).

Noticed while trying to improve multiplication costs for vectorization via the D103695 helper script. Confirmed with Intel AoM / Agner / InstLatX64.
2021-09-04 13:21:34 +01:00
Simon Pilgrim 7d062d2c47 [X86][Atom] MUL/DIV instructions require both ports, not either.
Noticed while trying to improve multiplication costs for vectorization via the D103695 helper script. Confirmed with Intel AoM.
2021-09-04 11:58:09 +01:00
Simon Pilgrim 0d0f39b0f3 [X86][Atom] Add missing UOps override to AtomWriteResPair multiclass
Make it easier to describe microcoded instructions.
2021-09-04 11:58:09 +01:00
Nikita Popov 66a54af967 [WebAssembly] Support opaque pointers in AddMissingPrototypes
The change here is basically the same as in D108880: Rather than
looking at bitcasts, look at calls and their function type. We
still need to look through bitcasts to find those calls.

The change in llvm/test/CodeGen/WebAssembly/add-prototypes-conflict.ll
is due to different visitation order. add-prototypes-opaque-ptrs.ll
is a copy of add-prototypes.ll with -force-opaque-pointers.

Differential Revision: https://reviews.llvm.org/D109256
2021-09-04 11:25:42 +02:00
Kazu Hirata bb51f76fb1 [ForceFunctionAttrs] Add const (NFC) 2021-09-03 22:29:58 -07:00
Kevin Athey c7f50a445e Revert "[AArch64] Implement target hook function to decide folding (mul (add x, c1), c2)"
This reverts commit 095bea23d0.

Broke buildbot: https://lab.llvm.org/buildbot/#/builders/5/builds/11411
2021-09-03 18:08:58 -07:00
Ben Shi 095bea23d0 [AArch64] Implement target hook function to decide folding (mul (add x, c1), c2)
Prevent the folding if it leads to worse code.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D108871
2021-09-04 07:24:23 +08:00
David Blaikie bc066e26c9 DebugInfo: Fix a few bot failures for type dumping fixes 2021-09-03 14:08:58 -07:00
David Blaikie 40f1593558 DebugInfo: Correct/improve type formatting (pointers to function types especially)
This does add some extra superfluous whitespace (eg: "int *") intended
to make the Simplified Template Names work easier - this makes the
DIE-based names match more exactly the clang-generated names, so it's
easier to identify cases that don't generate matching names.

(arguably we could change clang to skip that whitespace or add some
fuzzy matching to accommodate differences in certain whitespace - but
this seemed easier and fairly low-impact)
2021-09-03 12:22:28 -07:00
Sanjay Patel fd807601a7 [InstCombine] fold (rotate X) eq/ne (0/-1)
This generalizes the examples shown in:
https://llvm.org/PR51566

https://alive2.llvm.org/ce/z/V-sEy9
2021-09-03 14:51:35 -04:00
Sanjay Patel d1458903eb [InstCombine] reduce code duplication; NFC 2021-09-03 14:51:35 -04:00
Stanislav Mekhanoshin d0c064715c [AMDGPU] Small cleanup in optimizeCompareInstr. NFC. 2021-09-03 11:31:40 -07:00
David Green adfd12e6d1 [ARM] Add patterns for store(fptosisat(..))
As an extension to D107866, this adds store(fptosisat(..)) patterns,
similar to the existing fptosi patterns, to prevent unnecessarily moving
into gpr regs where we can use fp stores directly.

Differential Revision: https://reviews.llvm.org/D108378
2021-09-03 19:22:11 +01:00
David Green f37e132263 [ARM] Add VFP lowering for fptosi.sat
This extends D107865 to the VFP insructions, lowering llvm.fptosi.sat
and llvm.fptoui.sat to VCVT instructions that inherently perform the
saturate.

Differential Revision: https://reviews.llvm.org/D107866
2021-09-03 18:11:08 +01:00
Craig Topper 75620fadf5 [RISCV] Change how we encode AVL operands in vector pseudoinstructions to use GPRNoX0.
This patch changes the register class to avoid accidentally setting
the AVL operand to X0 through MachineIR optimizations.

There are cases where we really want to use X0, but we can't get that
past the MachineVerifier with the register class as GPRNoX0. So I've
use a 64-bit -1 as a sentinel for X0. All other immediate values should
be uimm5. I convert it to X0 at the earliest possible point in the VSETVLI
insertion pass to avoid touching the rest of the algorithm. In
SelectionDAG lowering I'm using a -1 TargetConstant to hide it from
instruction selection and treat it differently than if the user
used -1. A user -1 should be selected to a register since it doesn't
fit in uimm5.

This is the rest of the changes started in D109110. As mentioned there,
I don't have a failing test from MachineIR optimizations anymore.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D109116
2021-09-03 09:19:25 -07:00
David Spickett 02b4620348 [ORC] Static cast more uint64_t to size_t
These instances don't have an obvious way to fail
nicely so I've just asserted they are within range.

Fixes the Arm 32 bit builds.
2021-09-03 12:30:56 +00:00
Max Kazantsev 718157283c [LoopDeletion] Move ICmpInst handling to getValueOnFirstIteration()
As noticed in https://reviews.llvm.org/D105688, it would be great to move
handling of ICmpInst which was in canProveExitOnFirstIteration() to
getValueOnFirstIteration().

Patch by Dmitry Makogon!

Differential Revision: https://reviews.llvm.org/D108978
Reviewed By: reames
2021-09-03 18:36:19 +07:00
Konstantin Schwarz 90d5298759 [GlobalISel] Add convenience constructors to MemDesc
This allows constructing a MemDesc from a MachineMemoryOperand, a pattern that starts to show up more frequently.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D109161
2021-09-03 12:52:18 +02:00
Simon Pilgrim 6ba0b9f68a [X86][SLM] Fix PBLENDVB uops and throughput
SLM PBLENDVB is just as bad as BLENDVPD/PS - so model it as such, fixing the rr vs rm uops diff as well. The Intel AoM appears to have a copy+paste typo with PBLENDW, it doesn't match Agner or InstLatX64.

Noticed while investigating some of the weird discrepancies reported by the D103695 helper script (SLM had much better vector shift throughputs than it should).
2021-09-03 11:31:29 +01:00
gbreynoo e28cd75a50 [OptTable] Reapply Improve error message output for grouped short options
This reapplies 71d7fed3bc which was
reverted by 3e2bd82f02. This change
includes the fix for breaking the sanitizer bots.

As seen in https://bugs.llvm.org/show_bug.cgi?id=48880 the current
implementation for parsing grouped short options can return unclear
error messages. This change fixes the example given in the ticket in
which a flag is incorrectly given an argument. Also when parsing a
group we now keep reading past the first incorrect option and output
errors for all incorrect options in the group.

Differential Revision: https://reviews.llvm.org/D108770
2021-09-03 11:13:52 +01:00
Florian Mayer abf8ed8a82 [hwasan] Support more complicated lifetimes.
This is important as with exceptions enabled, non-POD allocas often have
two lifetime ends: the exception handler, and the normal one.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D108365
2021-09-03 10:29:50 +01:00
Stefan Gränitz 2ed91da0f1 [JITLink] Add initial Aarch64 support
Set up basic infrastructure for 64-bit ARM architecture support in JITLink. It allows for loading a minimal object file and resolving a single relocation. Advanced features like GOT and PLT handling or relaxations were intentionally left out for the moment.

This patch follows the idea to keep implementations for ARM (32-bit) and Aaarch64 (64-bit) separate, because:
* it might be easier to share code with the MachO "arm64" JITLink backend
* LLVM has individual targets for ARM and Aaarch64 as well

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D108986
2021-09-03 10:48:06 +02:00
Jingu Kang 562521e2d1 [LoopBoundSplit] Update phi node in exit block
It fixes https://bugs.llvm.org/show_bug.cgi?id=51700

Differential Revision:
2021-09-03 09:10:50 +01:00
Cullen Rhodes dc5dd77ac7 [AArch64][SME] Support NEON vector to GPR integer moves in streaming mode
A small subset of the NEON instruction set is legal in streaming mode.
This patch adds support for the following vector to integer move
instructions:

  0x00 1110 0000 0001 0010 11xx xxxx xxxx # SMOV W|Xd,Vn.B[0]
  0x00 1110 0000 0010 0010 11xx xxxx xxxx # SMOV W|Xd,Vn.H[0]
  0100 1110 0000 0100 0010 11xx xxxx xxxx # SMOV Xd,Vn.S[0]
  0000 1110 0000 0001 0011 11xx xxxx xxxx # UMOV Wd,Vn.B[0]
  0000 1110 0000 0010 0011 11xx xxxx xxxx # UMOV Wd,Vn.H[0]
  0000 1110 0000 0100 0011 11xx xxxx xxxx # UMOV Wd,Vn.S[0]
  0100 1110 0000 1000 0011 11xx xxxx xxxx # UMOV Xd,Vn.D[0]

Only the zero index variants are legal, all others indexes are illegal.
To support this, new instructions are defined specifically for zero
index which is hardcoded, along an implicit 'VectorIndex0' operand.
Since the index operand is implicit and takes no bits in the encoding,
custom decoding is required to add the operand.

I'm not sure if this is the best approach but the predicate constraint
on a subset of an operand is unusual. Would be interested to hear some
alternatives.

The instructions are predicated on 'HasNEONorStreamingSVE', i.e. they're
enabled by either +neon or +streaming-sve. This follows on from the work
in D106272 to support the subset of SVE(2) instructions that are legal
in streaming mode.

Depends on D107902.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D107903
2021-09-03 07:59:17 +00:00
Cullen Rhodes 1dcd900d1d [AArch64][ISel] NFC: DAG.getMachineFunction() -> MF
Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D109135
2021-09-03 07:59:17 +00:00
Amara Emerson 6d9505b8e0 [AArch64][GlobalISel] Support for folding G_ROTR as shifted operands.
This allows selection like: eor w0, w1, w2, ror #8

Saves 500 bytes on ClamAV -Os, which is 0.1%.

Differential Revision: https://reviews.llvm.org/D109206
2021-09-02 21:37:24 -07:00
Qiu Chaofan d0f9553ef5 [PowerPC] Enable fast-isel on AIX 64 subtarget
This patch basically enables fast-isel for AIX 64-bit subtarget
(previously enabled only for ELF 64). The initial motivation is to
introduce branch folding to AIX generated code for correct debug
behavior. I also saw some compiling time improvement in a few LLVM
test-suite benchmarks. (toast, dbms, cjpeg, burg, etc.)

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D98844
2021-09-03 11:33:45 +08:00
Chen Zheng 34badc409c Revert "[HardwareLoops] Change order of SCEV expression construction for InitLoopCount."
This causes https://bugs.llvm.org/show_bug.cgi?id=51714 and
is not a right patch according to comments in D91724

This reverts commit 42eaf4fe0a.
2021-09-03 02:55:43 +00:00
Matt Arsenault 79bcd4a7db AMDGPU: Remove FeatureLocalMemorySize0
There's no reason to make this an explicit feature, since it's implied
by the lack of a feature with a size.
2021-09-02 22:43:01 -04:00
Alexander Pivovarov 6cd4b508a8 [RISCV] Add SiFive core S51
Add SiFive core s51 as rv64imac RocketModel

Reviewed-By: MaskRay, evandro
Differential Revision: https://reviews.llvm.org/D108886
2021-09-02 18:45:25 -07:00
PeixinQiao a42380ce83 [OMPIRBuilder] Add ordered directive to OMPBuilder
Add support for ordered directive in the OpenMPIRBuilder.

This patch also modidies clang to use the ordered directive when the
option -fopenmp-enable-irbuilder is enabled.

Also fix one ICE when parsing one canonical for loop with the relational
operator LE or GE in openmp region by replacing unary increment
operation of the expression of the variable "Expr A" minus the variable
"Expr B" (++(Expr A - Expr B)) with binary addition operation of the
experssion of the variable "Expr A" minus the variable "Expr B" and the
expression with constant value "1" (Expr A - Expr B + "1").

Reviewed By: Meinersbur, kiranchandramohan

Differential Revision: https://reviews.llvm.org/D107430
2021-09-03 09:37:58 +08:00
Anna Thomas f661ce209f [LoopPredication] Fix MemorySSA crash in predicateLoopExits
The attached testcase crashes without the patch (Not the same accesses
in the same order).

When we move instructions before another instruction, we also need to
update the memory accesses corresponding to it.

Reviewed-By: asbirlea
Differential Revision: https://reviews.llvm.org/D109197
2021-09-02 21:26:07 -04:00
Alexander Pivovarov 1104e3258b Fix typo in RISCVMatInt.cpp comments 2021-09-02 18:11:09 -07:00
Stanislav Mekhanoshin 78fbd1aa3d [AMDGPU] Process any power of 2 in optimizeCompareInstr
Differential Revision: https://reviews.llvm.org/D109201
2021-09-02 17:39:17 -07:00
Xun Li 2cf30c4769 [Coroutines] Only run verifyFunction in debug mode
verifyFunction can be really slow on large functions. This can significantly slow down compilation in production.
Given that coroutine passes are fairly stable now, we should only run it in debug mode.

Differential Revision: https://reviews.llvm.org/D109198
2021-09-02 17:35:01 -07:00
Wenlei He 054487c5b2 [CSSPGO] Honor preinliner decision for ThinLTO importing
When pre-inliner decision is used for CSSPGO, we should take that into account for ThinLTO importing as well, so post-link sample loader inliner can favor that decision. This is handled by a small tweak in this patch. It also includes a change to transfer preinliner decision when merging context.

Differential Revision: https://reviews.llvm.org/D109088
2021-09-02 17:29:26 -07:00
Stanislav Mekhanoshin 2cfda6a691 [AMDGPU] Fold immediates in the optimizeCompareInstr
Peephole works before the first SIFoldOperands so most of
the immediates are in registers.

Differential Revision: https://reviews.llvm.org/D109186
2021-09-02 17:23:26 -07:00
Sam Clegg c32884c482 [WebAssembly] Rename WrapperPIC -> WrapperREL. NFC
This ISD node/wrapper represents am address which is relative to a base
address and therefore lowers to `i32.const` rather than `global.get`.

Use this wrapper type for TLS-relative addresses, paving the way for the
non-REL wrapper to be used to external TLS address once those are
supported.

Differential Revision: https://reviews.llvm.org/D109179
2021-09-02 20:04:34 -04:00
Philip Reames fa82a3d016 [runtimeunroll] Support epilogue unrolling with a parent loop
This patch adds support for unrolling inner loops using epilogue unrolling. The basic issue is that the original latch exit block of the inner loop could be outside the outer loop.  When we clone the inner loop and split the latch exit, the cloned blocks need to be in the outer loop.

Differential Revision: https://reviews.llvm.org/D108476
2021-09-02 16:29:20 -07:00
Philip Reames 45c672e20d [runtimeunroll] Under EXPENSIVE_CHECKS, validate loop info
Requested in review comment on D108476
2021-09-02 16:28:46 -07:00
Lang Hames dad60f8071 [ORC] Add EPCGenericJITLinkMemoryManager: memory management via EPC calls.
All ExecutorProcessControl subclasses must provide a JITLinkMemoryManager object
that can be used to allocate memory in the executor process. The
EPCGenericJITLinkMemoryManager class provides an off-the-shelf
JITLinkMemoryManager implementation for JITs that do not need (or cannot
provide) a specialized JITLinkMemoryManager implementation. This simplifies the
process of creating new ExecutorProcessControl implementations.
2021-09-03 08:28:29 +10:00
Jessica Paquette 844d8e0337 [GlobalISel] Combine icmp eq/ne x, 0/1 -> x when x == 0 or 1
This adds the following combines:

```
x = ... 0 or 1
c = icmp eq x, 1

->

c = x
```

and

```
x = ... 0 or 1
c = icmp ne x, 0

->

c = x
```

When the target's true value for the relevant types is 1.

This showed up in the following situation:

https://godbolt.org/z/M5jKexWTW

SDAG currently supports the `ne` case, but not the `eq` case. This can probably
be further generalized, but I don't feel like thinking that hard right now.

This gives some minor code size improvements across the board on CTMark at
-Os for AArch64. (0.1% for 7zip and pairlocalalign in particular.)

Differential Revision: https://reviews.llvm.org/D109130
2021-09-02 15:05:31 -07:00
Kirill Stoimenov cf53c6c971 [asan] Fixed link error by setting jump symbol to R_X86_64_PLT32.
Fixing this link error:
ld: error: relocation R_X86_64_PC32 cannot be used against symbol __asan_report_load...; recompile with -fPIC

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D109183
2021-09-02 21:50:56 +00:00
Kevin Athey 04ed6e7afc Revert "[CSSPGO] Honor preinliner decision for ThinLTO importing"
This reverts commit a2768b4732.

Breaks sanitizer-x86_64-linux-fast buildbot:
https://lab.llvm.org/buildbot/#/builders/5/builds/11334

Log snippet:
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80
FAIL: LLVM :: Transforms/SampleProfile/early-inline.ll (65549 of 78729)
******************** TEST 'LLVM :: Transforms/SampleProfile/early-inline.ll' FAILED ********************
Script:
--
: 'RUN: at line 1';   /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/early-inline.ll -instcombine -sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/einline.prof -S | /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/early-inline.ll
--
Exit Code: 2
Command Output (stderr):
--
/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53: runtime error: member call on null pointer of type 'llvm::sampleprof::FunctionSamples'
    #0 0x5a730f8 in shouldInlineCandidate /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53
    #1 0x5a730f8 in (anonymous namespace)::SampleProfileLoader::tryInlineCandidate((anonymous namespace)::InlineCandidate&, llvm::SmallVector<llvm::CallBase*, 8u>*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1178:21
    #2 0x5a6cda6 in inlineHotFunctions /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1105:13
    #3 0x5a6cda6 in (anonymous namespace)::SampleProfileLoader::emitAnnotations(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1633:16
    #4 0x5a5fcbe in runOnFunction /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:2008:12
    #5 0x5a5fcbe in (anonymous namespace)::SampleProfileLoader::runOnModule(llvm::Module&, llvm::AnalysisManager<llvm::Module>*, llvm::ProfileSummaryInfo*, llvm::CallGraph*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1922:15
    #6 0x5a5de55 in llvm::SampleProfileLoaderPass::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:2038:21
    #7 0x6552a01 in llvm::detail::PassModel<llvm::Module, llvm::SampleProfileLoaderPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:88:17
    #8 0x57f807c in llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManager.h:526:21
    #9 0x37c8522 in llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::StringRef>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/NewPMDriver.cpp:489:7
    #10 0x37e7c11 in main /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/opt.cpp:830:12
    #11 0x7fbf4de4009a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)
    #12 0x379e519 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt+0x379e519)
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53 in
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/early-inline.ll
--
********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80
FAIL: LLVM :: Transforms/SampleProfile/inline-cold.ll (65643 of 78729)
******************** TEST 'LLVM :: Transforms/SampleProfile/inline-cold.ll' FAILED ********************
Script:
--
: 'RUN: at line 4';   /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/inline-cold.prof -S | /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=NOTINLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll
: 'RUN: at line 5';   /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -passes=sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/inline-cold.prof -S | /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=NOTINLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll
: 'RUN: at line 8';   /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/inline-cold.prof -sample-profile-inline-size -S | /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=INLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll
: 'RUN: at line 11';   /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -passes=sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/inline-cold.prof -sample-profile-inline-size -sample-profile-cold-inline-threshold=9999999 -S | /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=INLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll
: 'RUN: at line 14';   /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -passes=sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/inline-cold.prof -sample-profile-inline-size -sample-profile-cold-inline-threshold=-500 -S | /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=NOTINLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll
--
Exit Code: 2
Command Output (stderr):
--
/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53: runtime error: member call on null pointer of type 'llvm::sampleprof::FunctionSamples'
    #0 0x5a730f8 in shouldInlineCandidate /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53
    #1 0x5a730f8 in (anonymous namespace)::SampleProfileLoader::tryInlineCandidate((anonymous namespace)::InlineCandidate&, llvm::SmallVector<llvm::CallBase*, 8u>*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1178:21
    #2 0x5a6cda6 in inlineHotFunctions /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1105:13
    #3 0x5a6cda6 in (anonymous namespace)::SampleProfileLoader::emitAnnotations(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1633:16
    #4 0x5a5fcbe in runOnFunction /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:2008:12
    #5 0x5a5fcbe in (anonymous namespace)::SampleProfileLoader::runOnModule(llvm::Module&, llvm::AnalysisManager<llvm::Module>*, llvm::ProfileSummaryInfo*, llvm::CallGraph*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1922:15
    #6 0x5a5de55 in llvm::SampleProfileLoaderPass::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:2038:21
    #7 0x6552a01 in llvm::detail::PassModel<llvm::Module, llvm::SampleProfileLoaderPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:88:17
    #8 0x57f807c in llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManager.h:526:21
    #9 0x37c8522 in llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::StringRef>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/NewPMDriver.cpp:489:7
    #10 0x37e7c11 in main /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/opt.cpp:830:12
    #11 0x7fcd534a209a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)
    #12 0x379e519 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt+0x379e519)
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53 in
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=INLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll
--
********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..
********************
Failed Tests (2):
  LLVM :: Transforms/SampleProfile/early-inline.ll
  LLVM :: Transforms/SampleProfile/inline-cold.ll
2021-09-02 14:48:31 -07:00
Arthur Eubanks 813a7f1ad7 [MemorySSA] Properly handle liveOnEntry in the walker printer
Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D109177
2021-09-02 12:51:27 -07:00
Arthur Eubanks 92b94a6d0c [Verifier] Only allow invariant.group metadata on stores and loads
As specified by https://llvm.org/docs/LangRef.html#invariant-group-metadata.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D109182
2021-09-02 12:49:04 -07:00
Sam Clegg 4664590d53 [WebAssemlby] Remove redundant SDTypeProfile. NFC
I added this back in https://reviews.llvm.org/D54647 but it wasn't
actually needed.

Differential Revision: https://reviews.llvm.org/D109176
2021-09-02 15:21:22 -04:00
Wenlei He f7fff46acc [CSSPGO] Allow inlining recursive call for preinliner
When preinliner is used for CSSPGO, we try to honor global preinliner decision as much as we can except for uninlinable callees. We rely on InlineCost::Never to prevent us from illegal inlining.

However, it turns out that we use InlineCost::Never for both illeagle inlining and some of the "not-so-beneficial" inlining.

The most common one is recursive inlining, while it can bloat size a lot during CGSCC bottom-up inlining, it's less of a problem when recursive inlining is guided by profile and done in top-down manner.

Ideally it'd be better to have a clear separation between inline legality check vs cost-benefit check, but that requires a bigger change.

This change enables InlineCost computation to allow inlining recursive calls, controlled by InlineParams. In SampleLoader, we now enable recursive inlining for CSSPGO when global preinliner decision is used.

With this change, we saw a few perf improvements on SPEC2017 with CSSPGO and preinliner on: 2% for povray_r, 6% for xalancbmk_s, 3% omnetpp_s, while size is about the same (no noticeable perf change for all other benchmarks)

Differential Revision: https://reviews.llvm.org/D109104
2021-09-02 11:24:27 -07:00
Nikita Popov c86e1ce73b [SCEVExpander] Simplify pointer overflow check
This is a followup to D104662 to generate slightly nicer code for
pointer overflow checks. Bypass expandAddToGEP and instead
explicitly generate i8 GEPs. This saves some bitcasts and negates
the value in a more obvious way. In particular, this prevents SCEV
from looking through the umul.with.overflow, same as in the integer
case.

The wrapping-pointer-ni.ll test deserves a comment: Previously,
this generated a typed GEP which used the umulo argument rather
than the multiplication result. This results in more compact IR in
that case, but effectively does the multiplication twice, the
second one is just hidden in the GEP. Reusing the umulo result
seems pretty reasonable to me.

Differential Revision: https://reviews.llvm.org/D109093
2021-09-02 20:15:59 +02:00
Sam Clegg ad2f94f398 [WebAssembly] Fix names of WebAssemblyWrapper SDNodes. NFC
Other platforms all use CamelCase as normal for these wrapper nodes.

Differential Revision: https://reviews.llvm.org/D109172
2021-09-02 13:54:44 -04:00
Heejin Ahn 28780e59f6 [WebAssembly] Add Wasm SjLj support
This add support for SjLj using Wasm exception handling instructions:
https://github.com/WebAssembly/exception-handling/blob/master/proposals/exception-handling/Exceptions.md

This does not yet support the mixed use of EH and SjLj within a
function. It will be added in a follow-up CL.

This currently passes all SjLj Emscripten tests for wasm0/1/2/3/s,
except for the below:
- `test_longjmp_standalone`: Uses Node
- `test_dlfcn_longjmp`: Uses NodeRAWFS
- `test_longjmp_throw`: Mixes EH and SjLj
- `test_exceptions_longjmp1`: Mixes EH and SjLj
- `test_exceptions_longjmp2`: Mixes EH and SjLj
- `test_exceptions_longjmp3`: Mixes EH and SjLj

Reviewed By: dschuff, tlively

Differential Revision: https://reviews.llvm.org/D108960
2021-09-02 10:51:02 -07:00
Nick Desaulniers 6860b136b9 [MipsISelLowering] avoid emitting libcalls to __multi3
Similar to D108842 and D108844.

__has_builtin(builtin_mul_overflow) returns true for 32b MIPS targets,
but Clang is deferring to compiler RT when encountering long long types.
This breaks MIPS malta_defconfig builds of the Linux kernel that are
using __builtin_mul_overflow with these types for these targets.

If the semantics of __has_builtin mean "the compiler resolves these,
always" then we shouldn't conditionally emit a libcall.

This will still need to be worked around in the Linux kernel in order to
continue to support malta_defconfig builds of the Linux kernel for this
target with older releases of clang.

Link: https://bugs.llvm.org/show_bug.cgi?id=28629
Link: https://github.com/ClangBuiltLinux/linux/issues/1438

Reviewed By: rengolin

Differential Revision: https://reviews.llvm.org/D108926
2021-09-02 10:41:37 -07:00
Daniil Suchkov 5c97507e2b [InlineCost] Introduce attributes to override InlineCost for inliner testing
This patch introduces four new string attributes: function-inline-cost,
function-inline-threshold, call-inline-cost and call-threshold-bonus.
These attributes allow you to selectively override some aspects of
InlineCost analysis. That would allow us to test inliner separately from
the InlineCost analysis.

That could be useful when you're trying to write tests for inliner and
you need to test some very specific situation, like "the inline cost has
to be this high", or "the threshold has to be this low". Right now every
time someone does that, they have get creative to come up with a way to
make the InlineCost give them the number they need (like adding ~30
load/add pairs for a trivial test). This process can be somewhat tedious
which can discourage some people from writing enough tests for their
changes. Also, that results in tests that are fragile and can be easily
broken without anyone noticing it because the test writer can't
explicitly control what input the inliner will get from the inline cost
analysis.

These new attributes will alleviate those problems to an extent.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D109033
2021-09-02 17:35:06 +00:00
Craig Topper 3e89cc5cda [X86] Remove isel predicates for xgetbv/xsetbv instructions so they can work on Windows.
https://reviews.llvm.org/D56686  was supposed to allow these to
work on Windows without needing to enable the xsave feature to
match MSVC. It seems this didn't work because the backend isel
patterns would still block it.

This patch removes the predicates from the isel patterns.

Fixes PR51706.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D109097
2021-09-02 10:25:02 -07:00
Stanislav Mekhanoshin 832c87b4fb [AMDGPU] Use S_BITCMP0_* to replace AND in optimizeCompareInstr
These can be used for reversed conditions if result of the AND
is unused except in the compare:

s_cmp_eq_u32 (s_and_b32 $src, 1), 0 => s_bitcmp0_b32 $src, 0
s_cmp_eq_i32 (s_and_b32 $src, 1), 0 => s_bitcmp0_b32 $src, 0
s_cmp_eq_u64 (s_and_b64 $src, 1), 0 => s_bitcmp0_b64 $src, 0
s_cmp_lg_u32 (s_and_b32 $src, 1), 1 => s_bitcmp0_b32 $src, 0
s_cmp_lg_i32 (s_and_b32 $src, 1), 1 => s_bitcmp0_b32 $src, 0
s_cmp_lg_u64 (s_and_b64 $src, 1), 1 => s_bitcmp0_b64 $src, 0

Differential Revision: https://reviews.llvm.org/D109099
2021-09-02 09:38:01 -07:00
Simon Pilgrim d66d520fe1 [X86][SSE] combineMulToPMADDWD - improve recognition of sign/zero extended upper bits
PMADDWD(v8i16 x, v8i16 y) == (v4i32) { (int)x[0]*y[0] + (int)x[1]*y[1], ..., (int)x[6]*y[6] + (int)x[7]*y[7] }

Currently combineMulToPMADDWD only folds cases where the upper 17 bits of both vXi32 inputs are known zero (i.e. the first half is positive and the second half of the pair is zero in each 2xi16 pair), this can be relaxed to only require one zero-extended input if the other input has at least 17 sign bits.

That way the sign of the result is still preserved, and the second half is still zero.

Noticed while investigating PR47437.

Differential Revision: https://reviews.llvm.org/D108522
2021-09-02 17:36:22 +01:00
Kazu Hirata e1bb54b593 [clangd, llvm] Remove redundant calls to c_str() (NFC)
Identified with readability-redundant-string-cstr.
2021-09-02 09:07:13 -07:00
Wenlei He a2768b4732 [CSSPGO] Honor preinliner decision for ThinLTO importing
When pre-inliner decision is used for CSSPGO, we should take that into account for ThinLTO importing as well, so post-link sample loader inliner can favor that decision. This is handled by a small tweak in this patch. It also includes a change to transfer preinliner decision when merging context.

Differential Revision: https://reviews.llvm.org/D109088
2021-09-02 08:24:06 -07:00
Bradley Smith 14e1a4a6ee [AArch64][SVE] Workaround incorrect types when lowering fixed length gather/scatter
When lowering a fixed length gather/scatter the index type is assumed to
be the same as the memory type, this is incorrect in cases where the
extension of the index has been folded into the addressing mode.

For now add a temporary workaround to fix the codegen faults caused by
this by preventing the removal of this extension. At a later date the
lowering for SVE gather/scatters will be redesigned to improve the way
addressing modes are handled.

As a short term side effect of this change, the addressing modes
generated for fixed length gather/scatters will not be optimal.

Differential Revision: https://reviews.llvm.org/D109145
2021-09-02 15:07:24 +00:00
Craig Topper b5fd6b46f5 [RISCV] Teach instruction selection to elide sext.w in some cases.
If a sext_inreg is up for isel, and all its users are W instructions,
we can skip emitting the sext_inreg. This helpful if the producing
instruction can't become a W instruction.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D108966
2021-09-02 07:54:34 -07:00
Evandro Menezes 5ebdb07e7e [RISCV] Enable shrink wrap by default
Differential Revision: https://reviews.llvm.org/D109037
2021-09-02 09:47:58 -05:00
Craig Topper e4e69ba4d1 [RISCV] Split PseudoVSETVLI into 2 instructions to allow different register classes for rs1.
X0 has special meaning for vsetvli, we need to make sure we never
create it a vsetvli that uses it by accident. This could happen
if the register coalescer coalesces a copy from X0 into this
instruction.

This patch splits the instruction so that we can have GPRNoX0
register class to use for the cases where we don't want the source
to be X0. The verifier won't let us explicitly use X0 on a GPRNoX0
operand so we need a separate pseudo for those cases.

I don't currently have a failing example for this. There was a
failure in D107957, but the coalescable copy from that example
should have been optimized away much earlier so I've fixed that.

This is not a complete fix. We still need to prevent the same
possible issue on the AVL operand of all of the vector instruction
pseudos. I don't want to make two versions of all of those so we
need to find a different solution for those. I have an idea I'm
going to try.

Differential Revision: https://reviews.llvm.org/D109110
2021-09-02 07:45:31 -07:00
Piotr Sobczak 30d6c39bca [AMDGPU] Add merging into S_BUFFER_LOAD_DWORDX8_IMM
Extend SILoadStoreOptimizer to merge into DWORDX8 variant of S_BUFFER_LOAD.

Merging into DWORDX2 and DWORDX4 variants is handled already.

Differential Revision: https://reviews.llvm.org/D108909
2021-09-02 16:26:25 +02:00
David Green 9cb8f4d1ad [ARM] Add a tail-predication loop predicate register
The semantics of tail predication loops means that the value of LR as an
instruction is executed determines the predicate. In other words:

mov r3, #3
DLSTP lr, r3        // Start tail predication, lr==3
VADD.s32 q0, q1, q2 // Lanes 0,1 and 2 are updated in q0.
mov lr, #1
VADD.s32 q0, q1, q2 // Only first lane is updated.

This means that the value of lr cannot be spilled and re-used in tail
predication regions without potentially altering the behaviour of the
program. More lanes than required could be stored, for example, and in
the case of a gather those lanes might not have been setup, leading to
alignment exceptions.

This patch adds a new lr predicate operand to MVE instructions in order
to keep a reference to the lr that they use as a tail predicate. It will
usually hold the zeroreg meaning not predicated, being set to the LR phi
value in the MVETPAndVPTOptimisationsPass. This will prevent it from
being spilled anywhere that it needs to be used.

A lot of tests needed updating.

Differential Revision: https://reviews.llvm.org/D107638
2021-09-02 13:42:58 +01:00
Roman Lebedev 3f1f08f0ed
Revert @llvm.isnan intrinsic patchset.
Please refer to
https://lists.llvm.org/pipermail/llvm-dev/2021-September/152440.html
(and that whole thread.)

TLDR: the original patch had no prior RFC, yet it had some changes that
really need a proper RFC discussion. It won't be productive to discuss
such an RFC, once it's actually posted, while said patch is already
committed, because that introduces bias towards already-committed stuff,
and the tree is potentially in broken state meanwhile.

While the end result of discussion may lead back to the current design,
it may also not lead to the current design.

Therefore i take it upon myself
to revert the tree back to last known good state.

This reverts commit 4c4093e6e3.
This reverts commit 0a2b1ba33a.
This reverts commit d9873711cb.
This reverts commit 791006fb8c.
This reverts commit c22b64ef66.
This reverts commit 72ebcd3198.
This reverts commit 5fa6039a5f.
This reverts commit 9efda541bf.
This reverts commit 94d3ff09cf.
2021-09-02 13:53:56 +03:00
Simon Pilgrim b0acd6c369 [X86] Fold PMADD(x,0) or PMADD(0,x) -> 0
Pulled out of D108522 - handle zero-operand cases for PMADDWD/VPMADDUBSW ops
2021-09-02 10:48:50 +01:00
Roman Lebedev 50634deaa5
Revert "[OpenMP][OpenMPIRBuilder] Implement loop unrolling."
Breaks build with -DBUILD_SHARED_LIBS=ON
```
CMake Error: The inter-target dependency graph contains the following strongly connected component (cycle):
  "LLVMFrontendOpenMP" of type SHARED_LIBRARY
    depends on "LLVMPasses" (weak)
  "LLVMipo" of type SHARED_LIBRARY
    depends on "LLVMFrontendOpenMP" (weak)
  "LLVMCoroutines" of type SHARED_LIBRARY
    depends on "LLVMipo" (weak)
  "LLVMPasses" of type SHARED_LIBRARY
    depends on "LLVMCoroutines" (weak)
    depends on "LLVMipo" (weak)
At least one of these targets is not a STATIC_LIBRARY.  Cyclic dependencies are allowed only among static libraries.
CMake Generate step failed.  Build files cannot be regenerated correctly.
```

This reverts commit 707ce34b06.
2021-09-02 12:42:23 +03:00
Fraser Cormack ef78f2106c [LegalizeTypes][VP] Add splitting support for binary VP ops
This patch extends D107904's introduction of vector-predicated (VP)
operation legalization to include vector splitting.

When the result of a binary VP operation needs splitting, all of its
operands are split in kind. The two operands and the mask are split as
usual, and the vector-length parameter EVL is "split" such that the low
and high halves each execute the correct number of elements.

Tests have been added to the RISC-V target to show splitting several
scenarios for fixed- and scalable-vector types. Without support for
`umax` (e.g. in the `B` extension) the generated code starts to branch.
Ideally a cost model would prevent their insertion in the first place.

Through these tests many opportunities for better codegen can be seen:
combining known-undef VP operations and for constant-folding operations
on `ISD::VSCALE`, to name but a few.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D107957
2021-09-02 10:15:53 +01:00
Simon Moll ea2cdbf5e6 [VP] Declaration and docs for vp.select intrinsic
llvm.vp.select extends the regular select instruction with an explicit
vector length (%evl).

All lanes with indexes at and above %evl are
undefined. Lanes below %evl are taken from the first input where the
mask is true and from the second input otherwise.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D105351
2021-09-02 11:17:14 +02:00
David Sherwood d581d94385 [SVE] Fix the FP arithmetic instruction costs for SVE
Several FP instructions (fadd, fsub, etc.) were incorrectly assigned
a higher cost for SVE because they have custom lowering, however we
know they are legal. This patch explicitly assigns a cost of 2 to
these opcodes.

Tests added here:

  Analysis/CostModel/AArch64/arith-fp-sve.ll

Differential Revision: https://reviews.llvm.org/D108993
2021-09-02 09:55:13 +01:00
Fangrui Song dfb7518df1 [MC] Set SHF_INFO_LINK on SHT_REL/SHT_RELA sections
sh_info links to a section, therefore SHF_INFO_LINK should be set as GNU as
does. The issue has been benign because linkers kindly combines relocation
sections w/ and w/o the flag.
2021-09-02 01:00:51 -07:00
Michael Kruse 707ce34b06 [OpenMP][OpenMPIRBuilder] Implement loop unrolling.
Add methods for loop unrolling to the OpenMPIRBuilder class and use them in Clang if `-fopenmp-enable-irbuilder` is enabled. The unrolling methods are:

 * `unrollLoopFull`
 * `unrollLoopPartial`
 * `unrollLoopHeuristic`

`unrollLoopPartial` and `unrollLoopHeuristic` can use compiler heuristics to automatically determine the unroll factor. If possible, that is if no CanonicalLoopInfo is required to pass to another method, metadata for LLVM's LoopUnrollPass is added. Otherwise the unroll factor is determined using the same heurstics as user by LoopUnrollPass. Not requiring a CanonicalLoopInfo, especially with `unrollLoopHeuristic` allows greater flexibility.

With full unrolling and partial unrolling with known unroll factor, instead of duplicating instructions by the OpenMPIRBuilder, the full unroll is still delegated to the LoopUnrollPass. In case of partial unrolling the loop is first tiled using the existing `tileLoops` methods, then the inner loop fully unrolled using the same mechanism.

Reviewed By: jdoerfert, kiranchandramohan

Differential Revision: https://reviews.llvm.org/D107764
2021-09-02 02:37:25 -05:00
Wenlei He c000b8bd5c [CSSPGO] Use preinliner decision by default when available
For CSSPGO, turn on `sample-profile-use-preinliner` by default. This simplifies the use of llvm-profgen preinliner as it's now simply driven by ContextShouldBeInlined flag for each context profile without needing extra compiler switch.

Note that llvm-profgen's preinliner is still off by default, under switch `csspgo-preinliner`.

Differential Revision: https://reviews.llvm.org/D109111
2021-09-01 23:45:38 -07:00
Markus Lavin 304f2bd21d [NPM] Added opt option -print-pipeline-passes.
Added opt option -print-pipeline-passes to print a -passes compatible
string describing the built pass pipeline.

As an example:
$ opt -enable-new-pm=1 -adce -licm -simplifycfg -o /dev/null /dev/null -print-pipeline-passes
verify,function(adce),function(loop-mssa(licm)),function(simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts>),verify,BitcodeWriterPass

At the moment this is best-effort only and there are some known
limitations:
- Not all passes accepting parameters will print their parameters
  (currently only implemented for simplifycfg).
- Some ClassName to pass-name mappings are not unique.
- Some ClassName to pass-name mappings are missing (e.g.
  BitcodeWriterPass).

Differential Revision: https://reviews.llvm.org/D108298
2021-09-02 08:23:33 +02:00
Markus Lavin 645af79e8e Revert "[NPM] Added opt option -print-pipeline-passes."
This reverts commit c71869ed4c.
2021-09-02 08:22:17 +02:00
Markus Lavin c71869ed4c [NPM] Added opt option -print-pipeline-passes.
Added opt option -print-pipeline-passes to print a -passes compatible
string describing the built pass pipeline.

As an example:
$ opt -enable-new-pm=1 -adce -licm -simplifycfg -o /dev/null /dev/null -print-pipeline-passes
verify,function(adce),function(loop-mssa(licm)),function(simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts>),verify,BitcodeWriterPass

At the moment this is best-effort only and there are some known
limitations:
- Not all passes accepting parameters will print their parameters
  (currently only implemented for simplifycfg).
- Some ClassName to pass-name mappings are not unique.
- Some ClassName to pass-name mappings are missing (e.g.
  BitcodeWriterPass).
2021-09-02 08:16:51 +02:00
Abinav Puthan Purayil 0baace5379 [DAGCombine] Add node level checks for fp-contract and fp-ninf in visitFMULForFMADistributiveCombine().
Differential Revision: https://reviews.llvm.org/D107551
2021-09-02 11:33:14 +05:30
Jinsong Ji 8671191d26 [NFC][PowerPC] Small code refactor in LoopInstrFormPrep
Avoid some duplicate code.

Reviewed By: #powerpc, shchenz

Differential Revision: https://reviews.llvm.org/D109083
2021-09-02 03:16:01 +00:00
Arthur Eubanks 7b08d9da55 Reland [MemorySSA] Add pass to print results of MemorySSA walker
Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D109028
2021-09-01 18:58:57 -07:00
Arthur Eubanks 0f63496ea4 Revert "[MemorySSA] Add pass to print results of MemorySSA walker"
This reverts commit 8f98477c2d.

Breaks bots
2021-09-01 18:45:19 -07:00
Chen Zheng 2596120199 [PowerPC] small code format refactor ; NFC
address the code review comments in patch https://reviews.llvm.org/D105872
2021-09-02 01:39:32 +00:00
Arthur Eubanks 8f98477c2d [MemorySSA] Add pass to print results of MemorySSA walker
Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D109028
2021-09-01 18:29:15 -07:00
Philip Reames bb0fa3ea02 Revert "snapshot - do not push"
This reverts commit 91f4655d92.

This wasn't intented to be pushed, sorry.
2021-09-01 16:59:23 -07:00
Philip Reames c3b3aa277a Fix a missing MemorySSA update in breakLoopBackedge
This is a case I'd missed in 6a8237. The odd bit here is that missing the edge removal update seems to produce MemorySSA which verifies, but is still corrupt in a way which bothers following passes. I wasn't able to reduce a single pass test case, which is why the reported test case is taken as is.

Differential Revision: https://reviews.llvm.org/D109068
2021-09-01 16:59:01 -07:00
Philip Reames 91f4655d92 snapshot - do not push 2021-09-01 16:59:01 -07:00
Jon Roelofs 9237eda304 Revert "[AArch64][GlobalISel] Legalize bswap <2 x i16>"
This reverts commit 5cd63e9ec2.

https://bugs.llvm.org/show_bug.cgi?id=51707

The sequence feeding in/out of the rev32/ushr isn't quite right:

 _swap:
         ldr     h0, [x0]
         ldr     h1, [x0, #2]
-        mov     v0.h[1], v1.h[0]
+        mov     v0.s[1], v1.s[0]
         rev32   v0.8b, v0.8b
         ushr    v0.2s, v0.2s, #16
-        mov     h1, v0.h[1]
+        mov     s1, v0.s[1]
         str     h0, [x0]
         str     h1, [x0, #2]
         ret
2021-09-01 16:49:20 -07:00
Stanislav Mekhanoshin f3645c792a [AMDGPU] Use S_BITCMP1_* to replace AND in optimizeCompareInstr
Differential Revision: https://reviews.llvm.org/D109082
2021-09-01 15:59:12 -07:00
Stanislav Mekhanoshin bf77b11277 [AMDGPU] Introduce optimizeCompareInstr
The following patterns are currently handled:

s_cmp_eq_u32 (s_and_b32 $src, 1), 1 => s_and_b32 $src, 1
s_cmp_eq_i32 (s_and_b32 $src, 1), 1 => s_and_b32 $src, 1
s_cmp_eq_u64 (s_and_b64 $src, 1), 1 => s_and_b64 $src, 1
s_cmp_ge_u32 (s_and_b32 $src, 1), 1 => s_and_b32 $src, 1
s_cmp_ge_i32 (s_and_b32 $src, 1), 1 => s_and_b32 $src, 1
s_cmp_lg_u32 (s_and_b32 $src, 1), 0 => s_and_b32 $src, 1
s_cmp_lg_i32 (s_and_b32 $src, 1), 0 => s_and_b32 $src, 1
s_cmp_lg_u64 (s_and_b64 $src, 1), 0 => s_and_b64 $src, 1
s_cmp_gt_u32 (s_and_b32 $src, 1), 0 => s_and_b32 $src, 1
s_cmp_gt_i32 (s_and_b32 $src, 1), 0 => s_and_b32 $src, 1

Differential Revision: https://reviews.llvm.org/D109031
2021-09-01 15:57:05 -07:00
Alina Sbirlea a10409fe23 [MemorySSAUpdater] Simplify updates when only deleting edges.
When performing only edge deletion, we don't need to do the DT updates
back and forth. Check for the existance of insert updates to simplify
this.
2021-09-01 15:48:20 -07:00
Roman Lebedev f5753125f0
[Codegen][TLI][X86] SimplifyMultipleUseDemandedBits(): 0'th vec subreg widening is free, try to perform it earlier
I believe, the profitability reasoning here is correct
"sub"reg is already located within the 0'th subreg of wider reg,
so if we have suvector insertion at index 0 into undef,
then it's always free do to.

After this, D109065 finally avoids the regression in D108382.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D109074
2021-09-02 00:54:05 +03:00
Fangrui Song 68745a557e [InstrProfiling] Use llvm.compiler.used if applicable for Mach-O
Similar to D97585.

D25456 used `S_ATTR_LIVE_SUPPORT` to ensure the data variable will be retained
or discarded as a unit with the counter variable, so llvm.compiler.used is
sufficient. It allows ld to dead strip unneeded profc and profd variables.

Reviewed By: vsk

Differential Revision: https://reviews.llvm.org/D105445
2021-09-01 14:46:51 -07:00
Alexander Pivovarov 4b04d54206 [RISCV] Fix typo in RISCVSchedSiFive7.td
Fix typo in "microarchitecure".

Differential Revision: https://reviews.llvm.org/D109006
2021-09-01 16:39:48 -05:00
David Green 49476a4d66 [ARM] Add MVE lowering for fptosi.sat
This adds lowering of the llvm.fptosi.sat and llvm.fptoui.sat intinsics,
selecting a VCVT instruction which under MVE will inherently perform the
saturate.

Differential Revision: https://reviews.llvm.org/D107865
2021-09-01 22:38:47 +01:00
Arthur Eubanks 39f780b51d [OpaquePtr] Cleanup some uses of getPointerElementType() in TailRecursionElimination 2021-09-01 14:24:47 -07:00
Philip Reames 73b951a7f7 [SCEV] Clarify requirements for zero-stride to be UB
There's a silent bug in our reasoning about zero strides. We assume that having a single static exit implies that if that exit is not taken, then the loop must be infinite. This ignores the potential for abnormal exits via exceptions. Consider the following example:

for (uint_8 i = 0; i < 1; i += 0) {
  throw_on_thousandth_call();
}

Our reasoning is such that we'd conclude this loop can't take the backedge as that would lead to a (presumed) infinite loop.

In practice, this is a silent bug because the loopIsFiniteByAssumption returns false strictly more often than the loopHaNoAbnormalExits property. We could reasonable want to change that in the future, so fixing the codeflow now is worthwhile.

Differential Revision: https://reviews.llvm.org/D109029
2021-09-01 14:01:13 -07:00
alex-t e3cbf1d437 [AMDGPU] enable scalar compare in truncate selection
Currently, the truncate selection dag node is expanded as a bitwise AND plus compare to 1.  This change enables scalar comparison in the pattern if the truncate node is uniform.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D108925
2021-09-01 23:35:11 +03:00
Philip Reames 3af8a11bc6 [LoopDeletion] Separate logic in breakBackedgeIfNotTaken using symboic max trip count [nfc]
As mentioned in D108833, the logic for figuring out if a backedge is dead was somewhat interwoven with the SCEV based logic and the symbolic eval logic. This is my attempt at making the code easier to follow.

Note that this is only NFC after the work done in 29fa37ec.  Thanks to Nikita for catching that case.

Differential Revision: https://reviews.llvm.org/D108848
2021-09-01 13:30:46 -07:00
Nikita Popov 7f058ce8c2 [WebAssembly] Support opaque pointers in FixFunctionBitcasts
With opaque pointers, no actual bitcasts will be present. Instead,
there will be a mismatch between the call FunctionType and the
function ValueType. Change the code to collect CallBases
specifically (rather than general Uses) and compare these types.

RAUW is no longer performed, as there would no longer be any
bitcasts that can be RAUWd.

Differential Revision: https://reviews.llvm.org/D108880
2021-09-01 22:17:24 +02:00
Philip Reames e735f2bf37 [SCEVExpander] Prefer pointer expansion for overflow checks
We'd special cased this logic to use pointer types for non-integral pointers, but there's no reason we can't do that for all pointer types.   Doing it this was has a few advantages:
a) The code itself becomes more straight forward, and easier to test.
b) We avoid introducing ptrtoint into programs which didn't have them in the source.
c) The resulting codegen is easier to analyze and simplify (mostly due to lack of ptrtoint).

Note that there are some test diffs, but a) running them through instcombine helps a ton, and b) there's enough missing obvious transforms on both before and after IR that it's clear this isn't performance sensitive.

This is mostly motivated by cleaning up mentions of non-integrals to have a clearer idea of what we actually need to support.

Differential Revision: https://reviews.llvm.org/D104662
2021-09-01 13:11:25 -07:00
Craig Topper ccbb4c8b4f [RISCV] Fold (RISCVISD::SELECT_CC X, Y, CC, Z, Z) -> Z.
If the true and false values are the same, we don't need a SELECT_CC.

This would normally be folded before a select is legalized to
select_cc. The test case exploits the late legalization of vscale
to trigger a case where they become identical after legalization.

This works around an issue found on a test case in D107957. In that
case the true/false values were both eventually 0 and the select was
used by a vector AVL operand. The select_cc got expanded to control
flow and a phi, but the phi inputs were both copies from X0. MachineIR
optimizations simplified this to a single copy from X0 going into the
vector instruction. This became the input of a vsetvli after vsetvli
insertion. Then register coalescing folded the copy into the vsetvli.
X0 as the source of a vsetvli is a special encoding and should not be
created by coalesing. We need to fix our vsetvli handling to make sure
this can never happen any other way, but removing the unneeded select
is still a worthwhile optimization.
2021-09-01 12:37:52 -07:00
Hongtao Yu f4711e0d00 [CSSPGO] Sort function offset table to speed up profile loading.
With the context split work, the context-based (an array of strings) sorting performed at profile load time is way more expansive than single-string-based sorting. This is likely due to auxiliary operations done on each array element, such as indirect references, std::min operations, also likely cache misses. In this change I'm presorting profiles during profile generation time to avoid sorting at compile time.

Compared to the previous context-split work, this effectively cuts down compile time by 20% for one of our large services and brings us closer to non-CS build, with still a small gap in build time.

Reviewed By: wenlei, wmi

Differential Revision: https://reviews.llvm.org/D109036
2021-09-01 12:17:48 -07:00
Nikita Popov 02f74eadbe [IVDescriptors] Make pointer inductions compatible with opaque pointers
Store the used element type in the InductionDescriptor. For typed
pointers, it remains the pointer element type. For opaque pointers,
we always use an i8 element type, such that the step is a simple
offset.

A previous version of this patch instead tried to guess the element
type from an induction GEP, but this is not reliable, as the GEP
may be hidden (see @both in iv_outside_user.ll).

Differential Revision: https://reviews.llvm.org/D104795
2021-09-01 21:02:05 +02:00
Philip Reames 29fa37ec9f [SCEV] If max BTC is zero, then so is the exact BTC [2 of 2]
This extends D108921 into a generic rule applied to constructing ExitLimits along all paths. The remaining paths (primarily howFarToZero) don't have the same reasoning about UB sensitivity as the howManyLessThan ones did. Instead, the remain cause for max counts being more precise than exact counts is that we apply context sensitive loop guards on the max path, and not on the exact path. That choice is mildly suspect, but out of scope of this patch.

The MVETailPredication.cpp change deserves a bit of explanation. We were previously figuring out that two SCEVs happened to be equal because the happened to be identical. When we optimized one with context sensitive information, but not the other, we lost the ability to prove them equal. So, cover this case by subtracting and then applying loop guards again. Without this, we see changes in test/CodeGen/Thumb2/mve-blockplacement.ll

Differential Revision: https://reviews.llvm.org/D109015
2021-09-01 11:51:48 -07:00
Alexander Yermolovich 779d24e151 [DWARF] Find offset of attribute.
This is used by BOLT to do patching of DebugInfo section, and Line Table. Directly by using find, and through getAttrFieldOffsetForUnit.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D107874
2021-09-01 11:49:55 -07:00
Sanjay Patel 8c7a7e1f67 [InstCombine] allow more min/max with 'not' folds for intrinsics
isFreeToInvert allows min/max with 'not' on both operands,
so easing the argument restriction catches the case where
that operand has one use.

We already handle the sub-patterns when there are less uses:
https://alive2.llvm.org/ce/z/8Jatm_

...but this is another step towards parity with the
equivalent icmp+select idioms ( D98152 ).

Differential Revision: https://reviews.llvm.org/D109059
2021-09-01 14:40:00 -04:00
Sanjay Patel 8a10f4a0f6 [InstCombine] use isFreeToInvert to generalize min/max with 'not'
This mimics the code for the corresponding cmp-select idiom.

This also prevents an infinite loop because isFreeToInvert
does not match constant expressions.

So this patch solves the same problem as D108814 and obsoletes
it, but my main motivation is to enhance the pattern matching
to allow more invertible ops. That change will be a follow-up
patch on top of this one.

Differential Revision: https://reviews.llvm.org/D109058
2021-09-01 14:34:22 -04:00
Arthur Eubanks b9b419a13c [NFC] Remove redundant code added in 04ce2de3 2021-09-01 11:30:07 -07:00
Arthur Eubanks 52e6d70c40 [NFC] Use newly introduced *AtIndex methods
Introduced in D108788. These are clearer.
2021-09-01 11:18:41 -07:00
Adrian Prantl 12de296d84 Tighten heuristic for coroutine debug info workaround.
The OutermostLoad condition is supposed to strip the outermost
DW_OP_deref operation because dbg.declares are implicitly
indirect. This patch makes sure the heuristic is only applied to
dbg.declare intrinsics and only if the outermost instruction is a
load.

This was found while qualifying the latest Swift compiler rebranch.

rdar://82037764
2021-09-01 11:15:36 -07:00
Artem Belevich 3af981b065 [IRLinker] Suppress linker warnings when linking with CUDA libdevice.
libdevice bitcode provided by NVIDIA is linked with clang/LLVM-generated IR
which uses nvptx*-nvidia-cuda triple. We need to mark them as compatible.

Differential Revision: https://reviews.llvm.org/D108835
2021-09-01 10:45:15 -07:00
Arthur Eubanks c969349260 [NFC] Rename attribute methods that work with indexes
This is part one of a couple of patches to fully rename these methods.

I've made the mistake of assuming that these indexes are for parameters
multiple times, but actually they're based off of a weird indexing
scheme AttributeList::AttrIndex where 0 is the return value and ~0 is
the function. Hopefully renaming these methods will make this clearer.
Ideally users should use more specific methods like
AttributeList::getFnAttr().

This patch simply adds the name that we want in the end. This is so the
removal of the methods with the original names happens in a separate
change to make it easier for downstream users.

This touches all relevant methods in AttributeList, CallBase, and Function.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D108788
2021-09-01 10:43:14 -07:00
Thomas Lively fec4749200 [WebAssembly] Lower v2f32 to v2f64 extending loads with promote_low
Previously extra wide v4f32 to v4f64 extending loads would be legalized to v2f32
to v2f64 extending loads, which would then be scalarized by legalization. (v2f32
to v2f64 extending loads not produced by legalization were already being emitted
correctly.) Instead, mark v2f32 to v2f64 extending loads as legal and explicitly
lower them using promote_low. This regresses the addressing modes supported for
the extloads not produced by legalization, but that's a fine trade off for now.

Differential Revision: https://reviews.llvm.org/D108496
2021-09-01 10:27:42 -07:00
Hongtao Yu dde162d8a5 [CSSPGO] Fix an access violation due to invalided std::vector pointer invalidation.
std::vector pointers can be invalided while growing. Using std::list instead.
2021-09-01 10:24:17 -07:00
Amara Emerson a86bbe1e31 [AArch64][GlobalISel] Handle any-extending FPR loads in manual selection code.
When we have an any-extending FPR bank load, none of the tablegen patterns
match and we fall back to the C++ selector. Like with the truncating stores
that were fixed recently, the C++ wasn't able to handle it and ended up
generating invalid copies between different size regclasses.

This change adds handling for this case, splitting the load into a regular
load and a SUBREG_TO_REG to extend it into the original wide destination reg.
2021-09-01 10:19:22 -07:00
hsmahesha 97688bfd3d Revert "Revert "Disable ReplaceLDS pass, patch up tests to match""
This reverts commit 5ae6804d17.
2021-09-01 21:52:50 +05:30
Hongtao Yu 7ca8030030 [CSSPGO] Enable loading MD5 CS profile.
Adding the compiler support of MD5 CS profile based on pervious context split work D107299. A MD5 CS profile is about 40% smaller than the string-based extbinary profile. As a result, the compilation is 15% faster.

There are a few conversion from real names to md5 names that have been made on the sample loader and context tracker side to get it work.

Reviewed By: wenlei, wmi

Differential Revision: https://reviews.llvm.org/D108342
2021-09-01 09:19:47 -07:00
Nikita Popov 9d720dcb89 [LoadStoreVectorizer] Make aliasing check more precise
The load store vectorizer currently uses isNoAlias() to determine
whether memory-accessing instructions should prevent vectorization.
However, this only works for loads and stores. Additionally, a
couple of intrinsics like assume are special-cased to be ignored.

Instead use getModRefInfo() to generically determine whether the
instruction accesses/modifies the relevant location. This will
automatically handle all inaccessiblememonly intrinsics correctly
(as well as other calls that don't modref for other reasons).
This requires generalizing the code a bit, as it was previously
only considering loads and stored in particular.

Differential Revision: https://reviews.llvm.org/D109020
2021-09-01 18:10:09 +02:00
hsmahesha 5ae6804d17 Revert "Disable ReplaceLDS pass, patch up tests to match"
This reverts commit 50ad3478bd.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D109062
2021-09-01 21:19:39 +05:30
Teresa Johnson badcd58589 [DIArgList] Re-unique after changing operands to fix non-determinism
We have a large compile showing occasional non-deterministic behavior
that is due to DIArgList not being properly uniqued in some cases. I
tracked this down to handleChangedOperands, for which there is a custom
implementation for DIArgList, that does not take care of re-uniquing
after updating the DIArgList Args, unlike the default version of
handleChangedOperands for MDNode.

Since the Args in the DIArgList form the key for the store, this seems
to be occasionally breaking the lookup in that DenseSet. Specifically,
when invoking DIArgList::get() from replaceVariableLocationOp, very
occasionally it returns a new DIArgList object, when one already exists
having the same exact Args pointers. This in turn causes a subsequent
call to Instruction::isIdenticalToWhenDefined on those two otherwise
identical DIArgList objects during a later pass to return false, leading
to different IR in those rare cases.

I modified DIArgList::handleChangedOperands to perform similar
re-uniquing as the MDNode version used by other metadata node types.
This also necessitated a change to the context destructor, since in some
cases we end up with DIArgList as distinct nodes: DIArgList is the only
metadata node type to have a custom dropAllReferences, so we need to
invoke that version on DIArgList in the DistinctMDNodes store to clean
it up properly.

Differential Revision: https://reviews.llvm.org/D108968
2021-09-01 07:04:02 -07:00
Florian Hahn a3d357e504
[FileCheck] Use StringRef for MatchRegexp to fix crash.
If MatchRegexp is an invalid regex, an error message will be printed
using SourceManager::PrintMessage via AddRegExToRegEx.

PrintMessage relies on the input being a StringRef into a string managed
by SourceManager. At the moment, a StringRef to a std::string
allocated in the caller of AddRegExToRegEx is passed. If the regex is
invalid, this StringRef is passed to PrintMessage, where it will crash,
because it does not point to a string managed via SourceMgr.

This patch fixes the crash by turning MatchRegexp into a StringRef If
we use MatchStr, we directly use that StringRef, which points into a
string from SourceMgr. Otherwise, MatchRegexp gets assigned
Format.getWildcardRegex(), which returns a std::string. To extend the
lifetime, assign it to a std::string variable WildcardRegexp and assign
MatchRegexp to a stringref to WildcardRegexp. WildcardRegexp should
always be valid, so we should never have to print an error message
via the SoureMgr I think.

Fixes PR49319.

Reviewed By: thopre

Differential Revision: https://reviews.llvm.org/D109050
2021-09-01 14:27:14 +02:00
Alexander Kornienko 893ac53afc Fix -Wunused-variable 2021-09-01 11:29:30 +02:00
Fraser Cormack 85fd44d7fe [SelectionDAG][NFC] Fix typo in assertion message
s/Uexpected/Unexpected.
2021-09-01 08:55:06 +01:00
Christudasan Devadasan 4dab15288d [AMDGPU] Introduce RC flags for vector register classes
Configure and use the TSFlags in TargetRegisterClass to
have unique flags for VGPR and AGPR register classes.
The vector register class queries like `hasVGPRs` will
now become more efficient with just a bitwise operation.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D108815
2021-09-01 02:55:45 -04:00
Kai Luo 5eaebd5d64 [PowerPC] Implement quadword atomic load/store
Add support to load/store i128 atomically.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D105612
2021-09-01 06:55:40 +00:00
Luke a78dd726f4 [SLP][RISCV] Implement unsigned getMinVectorRegisterBitWidth() for RISCV
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D108973
2021-09-01 14:25:15 +08:00
Fangrui Song 01152626ab [Linker] Handle comdat nodeduplicate
For a variable in a comdat nodeduplicate, its initializer may be significant.
E.g. its content may be implicitly referenced by another comdat member (or
required to parallel to another comdat member by the runtime when explicit
section is used). We can clone it into an unnamed private linkage variable to
preserve its content.

This partially fixes PR51394 (Sony's proprietary linker using LTO): no error
will be reported. This is partial because we do not guarantee the global
variable order if the runtime has parallel section requirement.

---

There is a similar issue for regular LTO, but unrelated to PR51394:

with lib/LTO (using either ld.lld or LLVMgold.so), linking two modules
with a weak function of the same name, can leave one weak profc and two
private profd, due to lib/LTO's current deficiency that it mixes the two
concepts together: comdat selection and symbol resolution. If the issue
is considered important, we should suppress private profd for the weak+
regular LTO case.

Reviewed By: phosek

Differential Revision: https://reviews.llvm.org/D108879
2021-08-31 22:32:20 -07:00
hsmahesha 98f4713122 [AMDGPU] Split entry basic block after alloca instructions.
While initializing the LDS pointers within entry basic block of kernel(s), make
sure that the entry basic block is split after alloca instructions.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D108971
2021-09-01 10:18:44 +05:30
Yonghong Song 89424a829f [DWARF] Support new TAG DW_TAG_LLVM_annotation
A new LLVM specific TAG DW_TAG_LLVM_annotation is added.
The name is suggested by Paul Robinson ([1]).
Currently, this tag is used to output __attribute__((btf_tag("string")))
annotations in dwarf. The following is an example for a global
variable with two btf_tag attributes:
  0x0000002a:   DW_TAG_variable
                  DW_AT_name      ("g1")
                  DW_AT_type      (0x00000052 "int")
                  DW_AT_external  (true)
                  DW_AT_decl_file ("/tmp/home/yhs/work/tests/llvm/btf_tag/t.c")
                  DW_AT_decl_line (8)
                  DW_AT_location  (DW_OP_addr 0x0)

  0x0000003f:     DW_TAG_LLVM_annotation
                    DW_AT_name    ("btf_tag")
                    DW_AT_const_value     ("tag1")

  0x00000048:     DW_TAG_LLVM_annotation
                    DW_AT_name    ("btf_tag")
                    DW_AT_const_value     ("tag2")

  0x00000051:     NULL

In the future, DW_TAG_LLVM_annotation may encode other type
of non-string const value.

 [1] https://lists.llvm.org/pipermail/llvm-dev/2021-June/151250.html

Differential Revision: https://reviews.llvm.org/D106621
2021-08-31 19:22:17 -07:00
Wang, Pengfei 74043caef2 [X86] Enable half type support in inline assembly constraints
Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105799
2021-09-01 09:29:31 +08:00
Petr Hosek 92f54e1c75 [Linker] Support weak symbols in nodeduplicate COMDAT group
When a nodeduplicate COMDAT group contains a weak symbol, choose
a non-weak symbol (or one of the weak ones) rather than reporting
an error. This should address issue PR51394.

With the current IR representation, a generic comdat nodeduplicate
semantics is not representable for LTO. In the linker, sections and
symbols are separate concepts. A dropped weak symbol does not force the
defining input section to be dropped as well (though it can be collected
by GC). In the IR, when a weak linkage symbol is dropped, its associate
section content is dropped as well.

For InstrProfiling, which is where ran into this issue in PR51394, the
deduplication semantic is a sufficient workaround.

Differential Revision: https://reviews.llvm.org/D108689
2021-08-31 17:44:33 -07:00
Joseph Huber 29a74a3915 [OpenMP] Add an option to always inline OpenMP device functions.
Performance on GPU targets can be highly variable, sometimes inlining
everything hurts performance and sometimes it greatly improves it. Add
an option to toggle this behaviour to better investigate it.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D109014
2021-08-31 18:48:30 -04:00
Kevin Athey 3e2bd82f02 Revert "[OptTable] Improve error message output for grouped short options"
This reverts commit 71d7fed3bc.

Reason: broke sanitizer bots
more info: https://reviews.llvm.org/D108770
2021-08-31 14:06:11 -07:00
Stanislav Mekhanoshin d170945bb2 [RegAlloc] Immediately delete dead instructions with live uses
When RA eliminated a dead def it can either immediately delete
the instruction itself or replace it with KILL to defer the
actual removal. If this instruction has a virtual register use
killing the register it will shrink the LI of the use. However,
if the LI covers the instruction and extends beyond it the
shrink will not happen. In fact that is impossible to shrink
such use because of the KILL still using it.

If later the LI of the use will be split at the KILL and the
KILL itself is eliminated after that point the new live segment
ends up at an invalid slot index.

This extremely rare condition was hit after D106408 which has
enabled rematerialization of such instructions. The replacement
with KILL is only done for rematerialized defs which became dead
and such rematerialization did not generally happen before.

The patch deletes an instruction immediately if it is a result
of rematerialization and has such use. An alternative would be
to prohibit a split at a KILL instruction, but it looks like it
is better to split a live range rather then keeping a killed
instruction just in case it can be rematerialized further.

Fixes PR51655.

Differential Revision: https://reviews.llvm.org/D108951
2021-08-31 13:46:00 -07:00
Nikita Popov 48ebe427c9 [SLPVectorizer] Make aliasing check more precise
SLPVectorizer currently uses AA::isNoAlias() to determine whether
two locations alias. This does not work if one of the instructions
is a call. Instead, we should check getModRefInfo(), which
determines whether an arbitrary instruction modifies or references
a given location.

Among other things, this prevents @llvm.experimental.noalias.scope.decl()
and other inaccessiblmemonly intrinsics from interfering with SLP
vectorization.

Differential Revision: https://reviews.llvm.org/D109012
2021-08-31 22:35:30 +02:00
Nick Desaulniers e9b3f25730 [RISCVISelLowering] avoid emitting libcalls to __mulodi4() and __multi3()
Similar to D108842, D108844, D108926, D108928, and D108936.

__has_builtin(builtin_mul_overflow) returns true for 32b RISCV targets,
but Clang is deferring to compiler RT when encountering long long types.

If the semantics of __has_builtin mean "the compiler resolves these,
always" then we shouldn't conditionally emit a libcall.

Link: https://bugs.llvm.org/show_bug.cgi?id=28629

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D108939
2021-08-31 11:23:56 -07:00
Nick Desaulniers d8b6ae072d [PPCISelLowering] avoid emitting libcalls to __mulodi4()
Similar to D108842, D108844, and D108926.

__has_builtin(builtin_mul_overflow) returns true for 32b PPC targets,
but Clang is deferring to compiler RT when encountering long long types.
This breaks ppc44x_defconfig + CONFIG_BLK_DEV_NBD=y builds of the Linux
kernel that are using builtin_mul_overflow with these types for these
targets.

If the semantics of __has_builtin mean "the compiler resolves these,
always" then we shouldn't conditionally emit a libcall.

This will still need to be worked around in the Linux kernel in order to
continue to support these builds of the Linux kernel for this
target with older releases of clang.

Link: https://bugs.llvm.org/show_bug.cgi?id=28629
Link: https://github.com/ClangBuiltLinux/linux/issues/1438

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D108936
2021-08-31 11:09:58 -07:00
Joe Nash c96839265a [AMDGPU] Enable ds_min/ds_max on more subtargets
Adds patterns for f64 ds_min/ds_max. Shrinks HasLDSFPAtomics
scope to enable f32.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D108994

Change-Id: Id890b677841ee588b20d42b1bb3f4cdbf6e9ba1a
2021-08-31 13:22:31 -04:00
Jessica Paquette 94d3ff09cf [GlobalISel] Don't use G_FPTOSI in G_ISNAN legalization
As noted in the comments in D108227, using G_FPTOSI produces wrong results for
G_ISNAN. Drop the G_FPTOSI and perform the operation on integer types.

Elsewhere in LLVM, a bitcast would be the appropriate choice (as it is in SDAG).
GlobalISel does not distinguish between integer and FP types, so a bitcast would
be meaningless here.
2021-08-31 10:26:42 -07:00
David Green 22c384129e [ARM] Add missing validForTailPredication for VMINNM/VMAXNM
Apparently this was missing, preventing the generation of tail
predication loops containing VMINNM, VMAXNM, VMINNMA and VMAXNMA.
2021-08-31 18:19:03 +01:00
Philip Reames b604fcb7bc [runtime] Move prolog/epilog block to a post-simplify strategy
The runtime unroller will try to produce a non-loop if the unroll count is 2 and thus the prolog/epilog loop would only run at most one iteration. The old implementation did this by avoiding loop construction entirely. This patches instead constructs the trivial loop and then explicitly breaks the backedge and simplifies. This does result in some additional code churn when triggered, but a) results in better quality code and b) removes a codepath which didn't work properly for multiple exit epilogs.

One oddity that I want to draw to reviewer attention is that this somehow changes revisit order. The new order looks equivalent to me, but I don't understand how creating and erasing an extra loop here creates this effect.

Differential Revision: https://reviews.llvm.org/D108521
2021-08-31 09:29:36 -07:00
Philip Reames 9b45fd909f [AlignFromAssume] Bailout w/non-constant alignments (pr51680)
This is a bailout for pr51680.  This pass appears to assume that the alignment operand to an align tag on an assume bundle is constant.  This doesn't appear to be required anywhere, and clang happily generates non-constant alignments for cases such as this case taken from the bug report:

// clang -cc1 -triple powerpc64-- -S -O1 opal_pci-min.c
extern int a[];
long *b;
long c;
void *d(long, int *, int, long, long, long) __attribute__((__alloc_align__(6)));
void e() {
  b = d(c, a, 0, 0, 5, c);
  b[0] = 0;
}

This was exposed by a SCEV change which allowed a non-constant alignment to reach further into the pass' code.  We could generalize the pass, but for now, let's fix the crash.
2021-08-31 09:20:52 -07:00
Sanjay Patel 6c0181c00f [InstCombine] fix typos in comments; NFC 2021-08-31 12:08:36 -04:00
Philip Reames 6600e1759b [SCEV] If max BTC is zero, then so is the exact BTC [1 of N]
This patch is specifically the howManyLessThan case.  There will be a couple of followon patches for other codepaths.

The subtle bit is explaining why the two codepaths have a difference while both are correct. The test case with modifications is a good example, so let's discuss in terms of it.
* The previous exact bounds for this example of (-126 + (126 smax %n))<nsw> can evaluate to either 0 or 1. Both are "correct" results, but only one of them results in a well defined loop. If %n were 127 (the only possible value producing a trip count of 1), then the loop must execute undefined behavior. As a result, we can ignore the TC computed when %n is 127. All other values produce 0.
* The max taken count computation uses the limit (i.e. the maximum value END can be without resulting in UB) to restrict the bound computation. As a result, it returns 0 which is also correct.

WARNING: The logic above only holds for a single exit loop. The current logic for max trip count would be incorrect for multiple exit loops, except that we never call computeMaxBECountForLT except when we can prove either a) no overflow occurs in this IV before exit, or b) this is the sole exit.

An alternate approach here would be to add the limit logic to the symbolic path. I haven't played with this extensively, but I'm hesitant because a) the term is optional and b) I'm not sure it'll reliably simplify away. As such, the resulting code quality from expansion might actually get worse.

This was noticed while trying to figure out why D108848 wasn't NFC, but is otherwise standalone.

Differential Revision: https://reviews.llvm.org/D108921
2021-08-31 08:50:11 -07:00
gbreynoo 71d7fed3bc [OptTable] Improve error message output for grouped short options
As seen in https://bugs.llvm.org/show_bug.cgi?id=48880 the current
implementation for parsing grouped short options can return unclear
error messages. This change fixes the example given in the ticket in
which a flag is incorrectly given an argument. Also when parsing a
group we now keep reading past the first incorrect option and output
errors for all incorrect options in the group.

Differential Revision: https://reviews.llvm.org/D108770
2021-08-31 16:41:08 +01:00
Hussain Kadhem 524ded7d01 [VP] implementation of sdag support for VP memory intrinsics
Followup to D99355: SDAG support for vector-predicated load/store/gather/scatter.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D105871
2021-08-31 17:01:50 +02:00
Nemanja Ivanovic 84d4ed1761 Revert "[DebugInfo] Emit DW_TAG_namelist and DW_TAG_namelist_item"
This reverts commit 0a6fad754e.
It caused failures on a number of PowerPC bots.
2021-08-31 09:24:50 -05:00
Kuba Mracek 4c066bd08b [GlobalDCE] Handle relative pointers in VFE (for Swift vtables)
To support Virtual Function Elimination to Swift, this PR adds support for Swift
vtables which contain "relative pointers" instead of direct pointer references.
These are in the form of:

@symbol = ... {
  i32 trunc (i64 sub (i64 ptrtoint (<type> @target to i64), i64 ptrtoint (... @symbol to i64)) to i32)
}

The PR extends GlobalDCE's way of looking up a vtable offset into a dependency
to be able to see through this expression and find the target symbol.

Differential Revision: https://reviews.llvm.org/D107645
2021-08-31 07:07:22 -07:00
Sanjay Patel 5d7d689edf [InstCombine] fix propagation of FMF through select-of-fnegs
The existing code was unquestionably wrong - it looked at one
fneg and ignored the other 2 instructions.

It was also untested, so it didn't make the list of bugs
flagged by Alive2.

This is an unusual propagation, but Alive2 agress that we
can intersect the fnegs and union that with the select,
then apply the results to both new instructions:
https://alive2.llvm.org/ce/z/SF8_dt
2021-08-31 09:52:17 -04:00
Sanjay Patel d59ae12d58 [InstCombine] fix typo; NFC 2021-08-31 09:02:14 -04:00
Anton Afanasyev 077d4cb3ab Revert "[SLP]No need to schedule/check parent for extract{element/value} instruction."
Revert since introduced issure reported here:
https://lists.llvm.org/pipermail/llvm-dev/2021-August/152411.html
Discussed starting from here: https://reviews.llvm.org/D108703#2974289

This reverts commit a36bc873a2.
2021-08-31 15:29:06 +03:00
Simon Pilgrim 9e2d14c285 [X86] Copy X86SchedSkylakeServer.td to X86SchedIceLake.td
Icelake, Rocketlake and Tigerlake targets currently use the SkylakeServer scheduler model, despite being a later microarchitecture, leading to both reported bugs (PR48110) and discrepancies when comparing llvm-mca reports to other profiling tools (OSACA, uops, uica, etc.). And tbh I'm getting sick of llvm-mca getting blamed for what are backend scheduler model issues :-(

This patch doesn't attempt to fix any of these discrepancies - there should be no changes in codegen - its a setup patch that copies the skx model, renames all the resources, adds the additional ports (but doesn't reference them yet) and updates the llvm-exegesis pfm counter mappings (based off https://sourceforge.net/p/perfmon2/libpfm4/ci/master/tree/lib/events/intel_icl_events.h).

This should make it trivial for anyone with hardware access to use llvm-exegesis reports to iteratively improve the model (my attempts to get hold of a cheap tiger lake box haven't been fruitful yet....).

I will copy the SkylakeServer llvm-mca resource tests as follow up commits - the diff should entirely be the resource renames.

Differential Revision: https://reviews.llvm.org/D108914
2021-08-31 11:57:20 +01:00
Simon Wallis f417b660ee [Arm] Add assert in T2 Imm7s code emitter
Add assert to provoke failure in object file output, not just in disassembly output.

Reviewed By: yroux

Differential Revision: https://reviews.llvm.org/D107259
2021-08-31 08:16:48 +01:00
Doug Beck ed6cff667e Fix typo s/beloinging/belonging
Differential Revision: https://reviews.llvm.org/D107099
2021-08-31 12:01:50 +05:30
Alexander Pivovarov eb946cc5b6 Fix typo in comments
Reviewed By: MaskRay, jsji

Differential Revision: https://reviews.llvm.org/D108857
2021-08-31 11:55:40 +05:30
Heejin Ahn 3419e85b15 [WebAssembly] Free setjmpTable before exiting calls in EmSjLj
This is an improvement over D107852. We don't need to enumerate specific
function names; we can just check for `noreturn` attribute. This also
requires us to make sure `__resumeExeption` and `emscripten_longjmp`
have `noreturn` attribute too; one of them is a JS function and the
other calls a JS function so Clang does not have a way to deduce they
don't return.

This is effectively NFC, because I'm not sure if there is an additional
case this case covers; if we add a custom function call that has
`noreturn` attribute, it will be processed within the SjLj handling and
turned into `__invoke` call. So this really applies to some special
functions like `emscripten_longjmp`.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D108955
2021-08-30 21:46:25 -07:00
Heejin Ahn b8fc71b7ae [WebAssembly] Share rethrowing BBs in LowerEmscriptenEHSjLj
There are three kinds of "rethrowing" BBs in this pass:
1. In Emscripten SjLj, after a possibly longjmping function call, we
   check if the thrown longjmp corresponds to one of setjmps within the
   current function. If not, we rethrow the longjmp by calling
   `emscripten_longjmp`.
2. In Emscripten EH, after a possibly throwing function call, we check
   if the thrown exception corresponds to the current `catch` clauses.
   If not, we rethrow the exception by calling `__resumeException`.
3. When both Emscripten EH and SjLj are used, when we check for an
   exception after a possibly throwing function call, it is possible
   that we get not an exception but a longjmp. In this case, we
   shouldn't swallow it; we should rethrow the longjmp by calling
   `emscripten_longjmp`.
4. When both Emscripten EH and SjLj are used, when we check for a
   longjmp after a possibly longjmping function call, it is possible
   that we get not a longjmp but an exception. In this case, we
   shouldn't swallot it; we should rethrow the exception by calling
   `__resumeException`.

Case 1 is in Emscripten SjLj, 2 is in Emscripten EH, and 3 and 4 are
relevant when both Emscripten EH and SjLj are used. 3 and 4 were first
implemented in D106525.

We create BBs for 1, 3, and 4 in this pass. We create those BBs for
every throwing/longjmping function call, along with other BBs that
contain condition checks. What this CL does is to create a single BB
within a function for each of 1, 3, and 4 cases. These BBs are exiting
BBs in the function and thus don't have successors, so easy to be shared
between calls.

The names of BBs created are:
Case 1: `call.em.longjmp`
Case 3: `rethrow.exn`
Case 4: `rethrow.longjmp`

For the case 2 we don't currently create BBs; we only replace the
existing `resume` instruction with `call @__resumeException`. And Clang
already creates only a single `resume` BB per function and reuses it,
so we don't need to optimize this case.

Not sure what are good benchmarks for EH/SjLj, but this decreases the
size of the object file for `grfmt_jpeg.bc` (presumably from opencv) we
got from one of our users by 8.9%. Even after running `wasm-opt -O4` on
them, there is still 4.8% improvement.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D108945
2021-08-30 21:44:34 -07:00
Hongtao Yu b9db70369b [CSSPGO] Split context string to deduplicate function name used in the context.
Currently context strings contain a lot of duplicated function names and that significantly increase the profile size. This change split the context into a series of {name, offset, discriminator} tuples so function names used in the context can be replaced by the index into the name table and that significantly reduce the size consumed by context.

A follow-up improvement made in the compiler and profiling tools is to avoid reconstructing full context strings which is  time- and memory- consuming. Instead a context vector of `StringRef` is adopted to represent the full context in all scenarios. As a result, the previous prevalent profile map which was implemented as a `StringRef` is now engineered as an unordered map keyed by `SampleContext`. `SampleContext` is reshaped to using an `ArrayRef` to represent a full context for CS profile. For non-CS profile, it falls back to use `StringRef` to represent a contextless function name. Both the `ArrayRef` and `StringRef` objects are underpinned by real array and string objects that are stored in producer buffers. For compiler, they are maintained by the sample reader. For llvm-profgen, they are maintained in `ProfiledBinary` and `ProfileGenerator`. Full context strings can be generated only in those cases of debugging and printing.

When it comes to profile format, nothing has changed to the text format, though internally CS context is implemented as a vector. Extbinary format is only changed for CS profile, with an additional `SecCSNameTable` section which stores all full contexts logically in the form of `vector<int>`, which each element as an offset points to `SecNameTable`. All occurrences of contexts elsewhere are redirected to using the offset of `SecCSNameTable`.

Testing
This is no-diff change in terms of code quality and profile content (for text profile).

For our internal large service (aka ads), the profile generation is cut to half, with a 20x smaller string-based extbinary format generated.

The compile time of ads is dropped by 25%.

Differential Revision: https://reviews.llvm.org/D107299
2021-08-30 20:09:29 -07:00
Keno Fischer ea8539111d [COFF] Force Symbols containing '.' to be quoted
In D87099, the mangler learned to quote export directives that contain
special characters. Only alhpanumerical characters as well as
'_', '$', '.' and '@' were exmpt from this quoting. However, at least
binutils considers an unquoted '.' to be syntax and object files
containing such symbols will cause errors during linking. Fix that
by removing '.' from the list of allowed exemptions.

Differential Revision: https://reviews.llvm.org/D100359
2021-08-30 17:26:57 -04:00
Artem Belevich 30dfd3449e [MemCpyOpt] Allow specifying --enable-memcpyopt-without-libcalls more than once
so we can override it via clang's CLI if necessary.
2021-08-30 13:55:55 -07:00
Andrew Litteken c58d4c4bd3 [IROutliner] Changing outliner to prioritize reductions on assembly rather than IR instruction
Currently, the IROutliner uses a simple metric to outline the largest amount
of IR possible to outline first if it fits the cost model. This is model
loses out on smaller blocks of code that have higher reductions in cost that
are contained within larger blocks of IR.

This reverses the order, where we calculate all of the costs first, and then
reorder and extract items based on the calculated results.

Reviewers: paquette

Differential Revision: https://reviews.llvm.org/D106440
2021-08-30 13:43:08 -07:00
Nikita Popov c1b7540645 [TTI] Sink IVDescriptors.h include (NFC)
Forward declare RecurrenceDescriptor and include IVDescritor.h
only in implementation code that actually needs it.
2021-08-30 22:41:58 +02:00
Craig Topper 201f6446da [LegalizeTypes][X86] Improve ExpandIntRes_FP_TO_SINT/ExpandIntRes_FP_TO_UINT when input is SoftPromoteHalf.
Instead of splitting off the fp16 to float conversion and generating
a libcall, we should split the operation into fp16 to float and float
to integer operations. This will allow the float to integer conversion
to go through any custom handling the target has. If the target doesn't
have custom handling then we should come back to ExpandIntRes_FP_TO_SINT/
ExpandIntRes_FP_TO_UINT automatically to create the libcall.

This avoids generating libcalls on 32-bit X86. These library functions may
not exist in 32-bit libgcc. At least for LLVM, we never generate them when
hardware floating point instructions are available.

Differential Revision: https://reviews.llvm.org/D108933
2021-08-30 13:12:59 -07:00
Bjorn Pettersson 789f01283d [SelectionDAG] Fix miscompile bugs related to smul.fix.sat with scale zero
When expanding a SMULFIXSAT ISD node (usually originating from
a smul.fix.sat intrinsic) we've applied some optimizations for
the special case when the scale is zero. The idea has been that
it would be cheaper to use an SMULO instruction (if legal) to
perform the multiplication and at the same time detect any overflow.
And in case of overflow we could use some SELECT:s to replace the
result with the saturated min/max value. The only tricky part
is to know if we overflowed on the min or max value, i.e. if the
product is positive or negative. Unfortunately the implementation
has been incorrect as it has looked at the product returned by the
SMULO to determine the sign of the product. In case of overflow that
product is truncated and won't give us the correct sign bit.

This patch is adding an extra XOR of the multiplication operands,
which is used to determine the sign of the non truncated product.

This patch fixes PR51677.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D108938
2021-08-30 22:08:26 +02:00
Owen Anderson db9de22f2b Teach the AArch64 backend patterns to generate the EOR3 instruction.
Adds patterns to match the EOR3 instruction.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D108793
2021-08-30 20:01:08 +00:00
Chih-Ping Chen 070090cfa5 [DebugInfo] Remove the restriction on the size of DIStringType
in DebugHandlerBase::isUnsignedDIType.

Differential Revision: https://reviews.llvm.org/D108559
2021-08-30 15:36:54 -04:00
Ellis Hoag 47b239eb5a [DIBuilder] Do not replace empty enum types
It looks like this array was missed in 4276d4a8d0

Fixed tests that expected `elements` to be empty or depeneded on the order of the empty DINode.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D107024
2021-08-30 12:33:03 -07:00
David Green efa340fbd2 [ARM] Workaround tailpredication min/max costmodel
The min/max intrinsics are not yet canonical, but when they are the tail
predications analysis will change from treating them like icmp to
treating them like intrinsics. Unfortunately, they can currently produce
better code by not being tail predicated thanks to the vectorizer picking
higher VF's and the backend folding to better instructions (especially
for saturate patterns). In the long run we will need to improve the
vectorizers cost modelling, recognizing the instruction directly, but in
the meantime this treats min/max as before to prevent performance
regressions.
2021-08-30 19:19:51 +01:00
Nikita Popov 0529e2e018 [InstrInfo] Use 64-bit immediates for analyzeCompare() (NFCI)
The backend generally uses 64-bit immediates (e.g. what
MachineOperand::getImm() returns), so use that for analyzeCompare()
and optimizeCompareInst() as well. This avoids truncation for
targets that support immediates larger 32-bit. In particular, we
can avoid the bugprone value normalization hack in the AArch64
target.

This is a followup to D108076.

Differential Revision: https://reviews.llvm.org/D108875
2021-08-30 19:46:04 +02:00
Mikhail Goncharov 5097b6e352 Revert "[SLP]Improve graph reordering."
This reverts commit 84cbd71c95.

This commit breaks one of the internal tests. As agreed with Alexey I
will provide the reproducer later.
2021-08-30 19:16:44 +02:00
Hongtao Yu f39256e3a5 [CSSPGO] Avoid repeatedly computing md5 hash code for pseudo probe inline contexts.
Md5 hashing is expansive. Using a hash map to look up already computed GUID for dwarf names. Saw a 2% build time improvement on an internal large application.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D108722
2021-08-30 10:11:47 -07:00
Nikita Popov 881677b58a [AsmParser] Support %ty* in force-opaque-pointers mode
Only enforce that ptr* is illegal if the base type is a simple type,
not when it is something like %ty, where %ty may resolve to an
opaque pointer in force-opaque-pointers mode.

Differential Revision: https://reviews.llvm.org/D108876
2021-08-30 19:05:00 +02:00
Andrei Elovikov 1724a16437 [NFC][clang] Move IR-independent parts of target MV support to X86TargetParser.cpp
...that is located under llvm/lib/Support/.

Reviewed By: erichkeane

Differential Revision: https://reviews.llvm.org/D108423
2021-08-30 09:48:48 -07:00
Andrew Litteken f564299fe9 [IROutliner] Ensure instructions at end of candidate are excluded
Occasionally instructions are between the last instruction in a region,
and the following instruction as identified by the Candidate.  This
adds an extra check right before splitting a candidate that excludes the region from being split/checked for outlining to remove errors.

Tests Added:
Tranforms/IROuutliner/outlining-extra-bitcasts.ll

Reviewer: paquette, jroelofs

Differential Revision: https://reviews.llvm.org/D104142
2021-08-30 09:30:26 -07:00
Kazu Hirata c50faffb4e [llvm] Remove redundant calls to str() and c_str() (NFC)
Identified with readability-redundant-string-cstr.
2021-08-30 09:05:05 -07:00
Craig Topper 0560a4adb3 [RISCV] Enable CONCAT_VECTORS for fixed FP vectors.
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D108487
2021-08-30 08:47:45 -07:00
Craig Topper 705d005781 [DAGCombiner][RISCV] Don't use vector types in DAGCombiner::tryStoreMergeOfLoads if we need a rotate.
The check for whether a rotate is possible occurs before the
memory legality checks for the integer type. So it's possible we
decide we can use a rotate, but then fail the legality checks. If
that happens we should not fall back to a vector type. This triggers
an assertion in the rotate handling when it finds a vector type
instead of an integer type.

In theory we could use a shufflevector in place of the rotate, but
right now I'd just like to fix the crash.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D108839
2021-08-30 08:47:15 -07:00
Andrew Litteken cf56b08d15 [IRSim] Adding missing comments canonical relation commit
Adding missing comments to IRSimilarityIdentifier.cpp since
they were not properly added in commit 063af63b96.
2021-08-30 08:41:05 -07:00
Jake Egan 57b46056b9 [AIX] Suppress -Waix-compat warning with SmallVector class
When building LLVM with Open XL and -Werror is specified, the -Waix-compat warning becomes an error. This patch updates the SmallVector class to suppress the -Waix-compat warning/error on AIX.

Reviewed By: daltenty

Differential Revision: https://reviews.llvm.org/D108577
2021-08-30 10:59:47 -04:00
Aaron Ballman 21d11c87a2 Silence a signed/unsigned mismatch warning; NFC 2021-08-30 08:51:08 -04:00
Djordje Todorovic 86f5288eae [LiveDebugValues] Cleanup Transfers when removing Entry Value
If we encounter a new debug value, describing the same parameter,
we should stop tracking the parameter's Entry Value. At that point,
in some cases, the Transfer which uses the parameter's Entry Value,
is already emitted. Thanks to the RemoveRedundantDebugValues pass,
many problems with incorrect instruction order and number of DBG_VALUEs
are fixed. However, we still cannot rely on the rule that each new
debug value is set by the previous non-debug instruction in Machine
Basic Block.

When new parameter debug value triggers removal of Backup Entry Value
for the same parameter, do the cleanup of Transfers emitted from Backup
Entry Values. Get the Transfer Instruction which created the new debug
value and search for debug values already emitted from the to-be-deleted
Backup Entry Value and attached to the Transfer Instruction. If found,
delete the Transfer and remove "primary" Entry Value Var Loc from
OpenRanges.

This patch fixes PR47628.

Patch by Nikola Tesic.

Differential revision: https://reviews.llvm.org/D106856
2021-08-30 14:00:41 +02:00
Simon Pilgrim af2920ec6f [TTI][X86] getArithmeticInstrCost - move opcode canonicalization before all target-specific costs. NFCI.
The GLM/SLM special cases still get tested first but after the the MUL/DIV/REM pattern detection - this will be necessary for when we make the SLM vXi32 MUL canonicalization generic to improve PMULLW/PMULHW/PMADDDW cost support etc.
2021-08-30 12:24:59 +01:00
Simon Pilgrim 7c25a32840 Fix MSVC "signed/unsigned mismatch" comparison warning. NFCI. 2021-08-30 12:11:09 +01:00
Roman Lebedev 795d142d23
[NFCI][IndVars] rewriteLoopExitValues(): don't expand SCEV's until needed
Previously, we'd expand *ALL* the SCEV's eagerly, because we needed to
check with `isValidRewrite()`, and discard bad rewrite candidates,
but now that we do not do that, we also don't need to always expand.

In particular, this avoids expanding potentially-huge SCEV's that we
would discard anyways because they are high-cost and we aren't
rewriting aggressively.
2021-08-30 12:28:24 +03:00
Roman Lebedev 7b0d59da9a
[IndVars] Drop check for the validity of rewrite
`isValidRewrite()` checks that the both the original SCEV,
and the rewrite SCEV have the same base pointer.
I //believe//, after all the recent SCEV improvements,
this invariant is already enforced by SCEV itself.

I originally tried changing it into an assert in D108043,
but that showed that it triggers on e.g. https://reviews.llvm.org/D108043#2946621,
where SCEV manages to forward the store to load,
test added.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D108655
2021-08-30 12:06:58 +03:00
“bhkumarn” 0a6fad754e [DebugInfo] Emit DW_TAG_namelist and DW_TAG_namelist_item
This patch emits DW_TAG_namelist and DW_TAG_namelist_item for fortran
namelist variables. DICompositeType is extended to support this fortran
feature.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D108553
2021-08-30 13:40:39 +05:30
Florian Hahn abd36fe512
[VPlan] Introduce code to limit querying VPValues using IR references.
After applying VPlan-to-VPlan transformations, using IR references to
query VPlan values may be incorrect, as the IR is not in sync with the
VPlan any longer.

To better detect such mis-matches, this patch introduces a new flag to
VPlans to indicate whether it is safe to query VPValues using IR values.

getVPValue is updated to assert if it is called when the flag indicates
it is not safe any longer.

There is an escape hatch via an extra argument, because there are 3
places that need to be fixed first. Those are

1. truncateToMinimalBitwidths
2. clearReductionWrapFlags
3. fixLCSSAPHIs

As a first step, this flag will help preventing new code from violating
this property.

Any suggestions with respect to naming very welcome!

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D108573
2021-08-30 09:12:09 +02:00
Wang, Pengfei ab40dbfe03 [X86] AVX512FP16 instructions enabling 6/6
Enable FP16 complex FMA instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105269
2021-08-30 13:08:45 +08:00
Qiu Chaofan 3bdd850d0c [PowerPC] Set branch/call instructions as no hasSideEffects
PowerPC can model these instructions, so we don't need this flag set.

Reviewed By: shchenz, jsji

Differential Revision: https://reviews.llvm.org/D71983
2021-08-30 12:23:35 +08:00
Arthur Eubanks 099e4bcd5d [InstCombine] Remove invariant group intrinsincs when comparing against null
We cannot leak any equivalency information by comparing against null
since null never has virtual metadata associated with it (when null is
not a valid dereferenceable pointer).

Instcombine seems to make sure that a null will be on the RHS, so we
don't have to check both operands.

This fixes a missed optimization in llvm-test-suite's MultiSource lambda
benchmark under -fstrict-vtable-pointers.

Reviewed By: Prazek

Differential Revision: https://reviews.llvm.org/D108734
2021-08-29 15:45:25 -07:00
Nikita Popov 9f7873784d [SCEVExpander] Reuse removePointerBase() for canonical addrecs
ExposePointerBase() in SCEVExpander implements basically the same
functionality as removePointerBase() in SCEV, so reuse it.

The SCEVExpander code assumes that the pointer operand on adds is
the last one -- I'm not sure that always holds. As such this might
not be strictly NFC.
2021-08-29 21:12:35 +02:00
Nikita Popov 0886fd5b3a [SCEVExpander] Remove unnecessary mul/udiv check (NFC)
Pointer-typed SCEV expressions can no longer be mul or udiv, so
we do not need to specially handle them here.
2021-08-29 20:47:00 +02:00
Nikita Popov 3f162e8e6d [SCEVExpander] Assert single pointer op in add (NFC)
There can only be one pointer operand in an add expression, and
we have sorted operands to guarantee that it is the first. As
such, the pointer check for other operands is dead code.
2021-08-29 20:30:56 +02:00
Nikita Popov e6a5dd60ff [SCEV] Assert unique pointer base (NFC)
Add expressions can contain at most one pointer operand nowadays,
assert that in getPointerBase() and removePointerBase().
2021-08-29 20:06:24 +02:00
Vince Bridgers 55ba1de7c5 [X86] Remove X86LowerAMXType::getRowFromCol from X86LowerAMXType.cpp
Remove method X86LowerAMXType::getRowFromCol since it's not used, and
it's causing a warning.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D108862
2021-08-29 12:27:34 -05:00
Kazu Hirata 96d3294555 [Support] Remove redundant calls to str() and c_str() (NFC)
Identified with readability-redundant-string-cstr.
2021-08-29 09:14:48 -07:00
Nikita Popov b28c3b9d9f [NewPM] Add missing LTO ArgPromotion pass
This is a followup to D96780 to add one more pass missing from the
NewPM LTO pipeline. The missing ArgPromotion run is inserted at
the same position as in the LegacyPM, resolving the already
present FIXME:
16086d47c0/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp (L1096-L1098)

The compile-time impact is minimal with ~0.1% geomean regression
on CTMark.

Differential Revision: https://reviews.llvm.org/D108866
2021-08-29 12:40:29 +02:00
Yonghong Song 4948927058 [BPF] support btf_tag attribute in .BTF section
A new kind BTF_KIND_TAG is added to .BTF to encode
btf_tag attributes. The format looks like
   CommonType.name : attribute string
   CommonType.type : attached to a struct/union/func/var.
   CommonType.info : encoding BTF_KIND_TAG
                     kflag == 1 to indicate the attribute is
                     for CommonType.type, or kflag == 0
                     for struct/union member or func argument.
   one uint32_t    : to encode which member/argument starting from 0.

If one particular type or member/argument has more than one attribute,
multiple BTF_KIND_TAG will be generated.

Differential Revision: https://reviews.llvm.org/D106622
2021-08-28 21:02:27 -07:00
Fangrui Song 510e106fa8 [Linker] Replace comdat based bool LinkFromSrc with enum class LinkFrom and improve nodeduplicate tests. NFC
This is different from symbol resolution based LinkFromSrc.  Rename to be
clearer.

In the future we may support a new enum member 'Both' for nodeduplicate. This is
feasible (by renaming to a private linkage GlobalValue), but we need to be
careful not to break InstrProfiling.cpp's expectation of parallel profd/profc.

The challenge is that current LTO symbol resolution only allows to mark one
profc as prevailing: the other profc in another comdat nodeduplicate may be
discarded while its associated profd isn't.
2021-08-28 13:56:32 -07:00
Kazu Hirata 0003d57434 [Analysis] Fix a "set but not used" warning 2021-08-28 06:37:01 -07:00
Nikita Popov 16086d47c0 [WebAssembly] Fix FastISel of condition in different block (PR51651)
If the icmp is in a different block, then the register for the icmp
operand may not be initialized, as it nominally does not have
cross-block uses. Add a check that the icmp is in the same block
as the branch, which should be the common case.

This matches what X86 FastISel does:
5b6b090cf2/llvm/lib/Target/X86/X86FastISel.cpp (L1648)

The "not" transform that could have a similar issue is dropped
entirely, because it is currently dead: The incoming value is
a branch or select condition of type i1, but this code requires
an i32 to trigger.

Fixes https://bugs.llvm.org/show_bug.cgi?id=51651.

Differential Revision: https://reviews.llvm.org/D108840
2021-08-28 10:28:24 +02:00
luxufan 89f546f6ba [JITLink][RISCV] Support GOT/PLT relocations
This patch add the R_RISCV_GOT_HI20 and R_RISCV_CALL_PLT relocation support. And the basic got/plt was implemented. Because of riscv32 and riscv64 has different pointer size, the got entry size and instructions of plt entry is different. This patch is the basic support, the optimization pass at preFixup stage has not been implemented.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D107688
2021-08-28 11:52:21 +08:00
Nick Desaulniers c8c176d999 [MipsISelLowering] avoid emitting libcalls to __mulodi4()
__has_builtin(__builtin_mul_overflow) returns true for 32b MIPS targets,
but Clang is deferring to compiler RT when encountering `long long`
types. This breaks sanitizer builds of the Linux kernel that are using
__builtin_mul_overflow with these types for these targets.

If the semantics of __has_builtin mean "the compiler resolves these,
always" then we shouldn't conditionally emit a libcall.

This will still need to be worked around in the Linux kernel in order to
continue to support malta_defconfig builds of the Linux kernel for this
target with older releases of clang.

Link: https://bugs.llvm.org/show_bug.cgi?id=28629
Link: https://github.com/ClangBuiltLinux/linux/issues/1438

Reviewed By: rengolin

Differential Revision: https://reviews.llvm.org/D108844
2021-08-27 15:15:36 -07:00
Nick Desaulniers 5c91b98c5d [ARMISelLowering] avoid emitting libcalls to __mulodi4()
__has_builtin(__builtin_mul_overflow) returns true for 32b ARM targets,
but Clang is deferring to compiler RT when encountering `long long`
types. This breaks sanitizer builds of the Linux kernel that are using
__builtin_mul_overflow with these types for these targets.

If the semantics of __has_builtin mean "the compiler resolves these,
always" then we shouldn't conditionally emit a libcall.

This will still need to be worked around in the Linux kernel in order to
continue to support allmodconfig builds of the Linux kernel for this
target with older releases of clang.

Link: https://bugs.llvm.org/show_bug.cgi?id=28629
Link: https://github.com/ClangBuiltLinux/linux/issues/1438

Reviewed By: rengolin

Differential Revision: https://reviews.llvm.org/D108842
2021-08-27 15:14:47 -07:00
Andrew Litteken 063af63b96 [IRSim][IROutliner] Canonicalizing commutative value numbering between similarity sections.
When the initial relationship between two pairs of values between
similar sections is ambiguous to commutativity, arguments to the
outlined functions can be passed in such that the order is incorrect,
causing miscompilations.  This adds a canonical mapping to each
similarity section, so that we can maintain the relationship of global
value numbering from one section to another.

Added Tests:
Transforms/IROutliner/outlining-commutative-operands-opposite-order.ll
unittests/Analysis/IRSimilarityIdentifierTest.cpp - IRSimilarityCandidate:CanonicalNumbering

Reviewers: jroelofs, jpaquette, yroux

Differential Revision: https://reviews.llvm.org/D104143
2021-08-27 15:02:56 -07:00
Johannes Doerfert 56e372b56e [Attributor][NFC] Silence unused variable warning 2021-08-27 16:38:13 -05:00
Nikita Popov 757409da7a [MergeICmps] Ignore clobbering instructions before the loads
This is another followup to D106591. Even if there is an
instruction that clobbers one of the loads, this doesn't matter if
it happens before the loads. Those instructions aren't affected by
the transform at all.

The gep-references-bb.ll is modified to preserve the spirit of the
test, as the store to @g no longer impacts the transform.

Differential Revision: https://reviews.llvm.org/D108782
2021-08-27 23:31:35 +02:00
Philip Reames c7b25e4359 [LoopDeletion] Use max trip count to break backedge in addition to exact one
We'd added support a while back from breaking the backedge if SCEV can prove the trip count is zero. However, we used the exact trip count which requires *all* exits be analyzeable. I noticed while writing test cases for another patch that this disallows cases where one exit is provably taken paired with another which is unknown. This patch adds the upper bound case.

We could use a symbolic max trip count here instead, but we use an isKnownNonZero filter (presumably for compile time?) for the first-iteration reasoning. I decided this was a more obvious incremental step, and we could go back and untangle the schemes separately.

Differential Revision: https://reviews.llvm.org/D108833
2021-08-27 14:19:44 -07:00
Valentin Churavy 4cacb5cad0
[MergeICmps] Don't merge icmps derived from pointers with addressspaces
IIUC we can't emit `memcmp` between pointers in addressspaces,
doing so will trigger an assertion since the signature of the memcmp
will not match it's arguments (https://bugs.llvm.org/show_bug.cgi?id=48661).

This PR disables the attempt to merge icmps,
when the pointer is in an addressspace.

Reviewed By: #julialang, vtjnash

Differential Revision: https://reviews.llvm.org/D94813
2021-08-27 22:15:02 +02:00
Craig Topper dbf0d8118c [RISCV] Use ~0ULL instead of ~0U when checking for invalid ErrorInfo.
ErrorInfo is a uint64_t and is initialized to all 1s.

Not sure how to test this. Noticed while working on .insn support.
2021-08-27 12:30:33 -07:00
Haowei Wu 31e61c58b0 [ifs] Add option to hide undefined symbols
This change add an option to llvm-ifs to hide undefined symbols from
its output.

Differential Revision: https://reviews.llvm.org/D108428
2021-08-27 11:15:56 -07:00
Johannes Doerfert e05940de2a [Attributor][FIX] Recursion via memory needs to be tracked explicitly
Recursion can happen when we see a PHI use the second time or when we
look at a store value operand use again. We already visited the
potential copies and doing so again will just cause endless looping.

Reviewed By: kuter

Differential Revision: https://reviews.llvm.org/D108190
2021-08-27 13:12:13 -05:00
Johannes Doerfert caa3b28260 [Attributor][FIX] Do not treat byval args as local memory (for now)
For now we do should not treat byval arguments as local copies performed
on the call edge, though, in general we should. To make that happen we
need to teach various passes, e.g., DSE, about the copy effect of a
byval. That would also allow us to mark functions only accessing byval
arguments as readnone again, atguably their acceses have no effect
outside of the function, like accesses to allocas.

Reviewed By: kuter

Differential Revision: https://reviews.llvm.org/D108140
2021-08-27 13:12:11 -05:00
Philip Reames 6a82376012 Special case common branch patterns in breakLoopBackedge (try 2)
Changes since aec08e:
* Adjust placement of a closing brace so that the general case actually runs.  Turns out we had *no* coverage of the switch case.  I added one in eae90fd.
* Drop .llvm.loop.* metadata from the new branch as there is no longer a loop to annotate.

Original commit message:

This special cases an unconditional latch and a conditional branch latch exit to improve codegen and test readability. I am hoping to reuse this function in the runtime unroll code, but without this change, the test diffs are far too complex to assess.
2021-08-27 10:27:16 -07:00
Roman Lebedev 6734018041
[Codegen][X86] EltsFromConsecutiveLoads(): if only have AVX1, ensure that the "load" is actually foldable (PR51615)
This fixes another reproducer from https://bugs.llvm.org/show_bug.cgi?id=51615
And again, the fix lies not in the code added in D105390

In this case, we completely don't check that the "broadcast-from-mem" we create
can actually fold the load. In this case, it's operand was not a load at all:
```
Combining: t16: v8i32 = vector_shuffle<0,u,u,u,0,u,u,u> t14, undef:v8i32
Creating new node: t29: i32 = undef
RepeatLoad:
t8: i32 = truncate t7
  t7: i64 = extract_vector_elt t5, Constant:i64<0>
    t5: v2i64,ch = load<(load (s128) from %ir.arg)> t0, t2, undef:i64
      t2: i64,ch = CopyFromReg t0, Register:i64 %0
        t1: i64 = Register %0
      t4: i64 = undef
    t3: i64 = Constant<0>
Combining: t15: v8i32 = undef

```

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D108821
2021-08-27 20:26:53 +03:00
Philipp Krones 54e8cae565 [MC][RISCV] Add RISCV MCObjectFileInfo
This makes sure, that the text section will have a 2-byte alignment, if
the +c extension is enabled.

Reviewed By: MaskRay, luismarques

Differential Revision: https://reviews.llvm.org/D102052
2021-08-27 18:23:29 +01:00
Craig Topper 0eeab8b282 [RISCV] Add -riscv-v-fixed-length-vector-elen-max to limit the ELEN used for fixed length vectorization.
This adds an ELEN limit for fixed length vectors. This will scalarize
any elements larger than this. It will also disable some fractional
LMULs. For example, if ELEN=32 then mf8 becomes illegal, i32/f32
vectors can't use any fractional LMULs, i16/f16 can only use mf2,
and i8 can use mf2 and mf4.

We may also need something for the scalable vectors, but that has
interactions with the intrinsics and we can't scalarize a scalable
vector.

Longer term this should come from one of the Zve* features
2021-08-27 10:17:35 -07:00
Fangrui Song 83dfa0d098 [MC] Change ELFOSABI_NONE to ELFOSABI_GNU for STB_GNU_UNIQUE
Similar to D97976.
On Linux, most GCC installations are configured with
`--enable-gnu-unique-object` and such GCC emits `@gnu_unique_object` assembly.

The feature is highly controversial and disliked by many folks.
(On glibc DF_1_NODELETE is implicitly enabled and makes dlclose a no-op).

In llvm-project STB_GNU_UNIQUE is assembly only. Clang does not use STB_GNU_UNIQUE.

Use ELFOSABI_GNU to match GNU as behavior and avoid collision with other
OSABI binding values.

Reviewed By: jrtc27

Differential Revision: https://reviews.llvm.org/D107861
2021-08-27 08:53:55 -07:00
Kazu Hirata 72bbd1559e [IR] Remove getWithOperandReplaced (NFC)
The function hasn't been used for at least 10 years.
2021-08-27 08:42:57 -07:00
Matt Arsenault 1494298b51 GlobalISel: Remove check for empty functions as these are invalid IR 2021-08-27 09:27:06 -04:00
Sanjay Patel 416a119f9e [GlobalOpt] don't hoist constant expressions that can trap
We try to forward a stored-once-constant-value from one global access
to another, but that's not safe if the constant value is an expression
that can trap.

The tests are reduced from the miscompile examples in:
https://llvm.org/PR47578

Differential Revision: https://reviews.llvm.org/D108771
2021-08-27 08:10:20 -04:00
Jun Ma 15b2a8e7fa [AArch64][SVE] Optimize ptrue predicate pattern with known sve register width.
For vectors that are exactly equal to getMaxSVEVectorSizeInBits, just use
AArch64SVEPredPattern::all, which can enable the use of unpredicated ptrue when available.

TestPlan: check-llvm

Differential Revision: https://reviews.llvm.org/D108706
2021-08-27 20:03:48 +08:00
Jun Ma 8c47103491 [AArch64][SVE] Add API for conversion between SVE predicate pattern and element number. NFC
This patch solely moves convert operation between SVE predicate pattern
and element number into two small functions. It's pre-commit patch for optimize
pture with known sve register width.

Differential Revision: https://reviews.llvm.org/D108705
2021-08-27 20:03:48 +08:00
Jun Ma 3f919dfe0d [AArch64][SVE] Use getPTrue uniformly.NFC. 2021-08-27 20:03:48 +08:00
Serge Pavlov cdbe569fb6 [X86] Implement llvm.isnan(x86_fp80) as unordered comparison
x86_fp80 format allows values that do not fit any of IEEE-754 category.
Previously they were recognized by intrinsic __builtin_isnan as NaNs.
Now this intrinsic is implemented using instruction FXAM, which
distinguish between NaNs and unsupported values. It can make some
programs behave differently.

As a solution, this fix changes lowering of the intrinsic. If floating
point exceptions are ignored, llvm.isnan is lowered into unordered
comparison, as __buildtin_isnan was implemented earlier. In strictfp
functions the intrinsic is lowered using FXAM, which does not raise
exceptions even for signaling NaN, as required by IEEE-754 and C
standards.

Differential Revision: https://reviews.llvm.org/D108037
2021-08-27 18:06:07 +07:00
Nathan Sidwell 199ac3a839 [NFC][X86] Sret return register cleanup
There are no paths into LowerFormalParms that have already specified
the sret register. We always materialize a virtual and then assign it
to the physical reg at the point of the return.

Differential Revision: https://reviews.llvm.org/D108762
2021-08-27 04:03:49 -07:00
Carl Ritson 5d9de3ea18 [DAGCombine] Allow FMA combine with both FMA and FMAD
Without this change only the preferred fusion opcode is tested
when attempting to combine FMA operations.
If both FMA and FMAD are available then FMA ops formed prior to
legalization will not be merged post legalization as FMAD becomes
the preferred fusion opcode.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D108619
2021-08-27 19:49:35 +09:00
Ricky Taylor 8d3f112f0c [M68k] Update pointer data layout
Fixes PR51626.

The M68k requires that all instruction, word and long word reads are
aligned to word boundaries. From the 68020 onwards, there is a
performance benefit from aligning long words to long word boundaries.

The M68k uses the same data layout for pointers and integers.

In line with this, this commit updates the pointer data layout to
match the layout already set for 32-bit integers: 32:16:32.

Differential Revision: https://reviews.llvm.org/D108792
2021-08-27 11:47:27 +01:00
Roman Lebedev d4d459e747
[X86] AMD Zen 3: MULX w/ mem operand has the same throughput as with reg op
Exegesis is faulty and sometimes when measuring throughput^-1
produces snippets that have loop-carried dependencies,
which must be what caused me to incorrectly measure it originally.

After looking much more carefully, the inverse throughput should match
that of the MULX w/ reg op.

As per llvm-exegesis measurements.
2021-08-27 13:27:05 +03:00
Roman Lebedev 0f04936a2d
[X86] AMD Zen 3: MULX produces low part of the result in 3cy, +1cy for high part
As per llvm-exegesis measurements.
2021-08-27 13:27:05 +03:00
Lang Hames b749ef9e22 [ORC][ORC-RT] Reapply "Introduce ELF/*nix Platform and runtime..." with fixes.
This reapplies e256445bff, which was reverted in 45ac5f5441 due to bot errors
(e.g. https://lab.llvm.org/buildbot/#/builders/112/builds/8599). The issue that
caused the bot failure was fixed in 2e6a4fce35.
2021-08-27 14:41:58 +10:00
Lang Hames 2e6a4fce35 [ORC][JITLink][ELF] Treat STB_GNU_UNIQUE as Weak in the JIT.
This should fix the bot error in
https://lab.llvm.org/buildbot/#/builders/112/builds/8599
which forced reversion of the ELFNixPlatform in 45ac5f5441.

This should allow us to re-enable the ELFNixPlatform in a follow-up patch.
2021-08-27 14:41:28 +10:00
Matt Arsenault 04ce2de330 AMDGPU: Remove implicit argument attributes when introducing new calls
In a future patch, a new set of amdgpu-no-* attributes will be
introduced to indicate when a function does not need an implicitly
passed input. This pass introduces new instances of these intrinsic
calls, and should remove the attributes if they were present before.
2021-08-26 22:08:04 -04:00
Chen Zheng 324bd467a2 [PowerPC][ELF] make sure local variable space does not overlap with parameter save area
Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D105271
2021-08-27 01:58:41 +00:00
Matt Arsenault 088cc63640 AMDGPU: Invert AMDGPUAttributor
Switch to using BitIntegerState for each of the inputs, and invert
their meanings.

This now diverges more from the old AMDGPUAnnotateKernelFeatures, but
this isn't used yet anyway.
2021-08-26 21:32:13 -04:00
Matt Arsenault 3fdcd9bb13 GlobalISel: Add CallBase to CallLoweringInfo
The DAG version has this, and is necessary for call lowering to take
advantage of any attributes at the call site.
2021-08-26 21:09:11 -04:00
Matt Arsenault 46d82e7357 AMDGPU: Restrict attributor transforms
We only really want this to add the custom attributes. Theoretically
the regular transforms were already run at this point. Touching
undefined behavior breaks a lot of tests when this is enabled by
default, many of which are expecting to test handling of undef
operations.
2021-08-26 21:08:51 -04:00
Matt Arsenault cf32d61a05 AMDGPU: Remove hacky attribute deduction from AMDGPUAttributor
amdgpu-calls and amdgpu-stack-objects don't really belong as
attributes, and are currently a hacky way of passing an analysis into
the DAG. These don't really belong in the IR, and don't really fit in
with the other attributes. Remove these to facilitate inverting the
pass.

I don't exactly understand the indirect call test changes. These tests
are using calls which are trivially replacable with a direct call, so
I'm not sure what the point is.
2021-08-26 20:31:14 -04:00
Matt Arsenault 98d7aa435f AMDGPU: Stop inferring use of llvm.amdgcn.kernarg.segment.ptr
We no longer use this intrinsic outside of the backend and no longer
support using it outside of kernels.
2021-08-26 20:30:03 -04:00
Heejin Ahn f5cff292e2 [WebAssembly] Fix PHI when relaying longjmps
When doing Emscritpen EH, if SjLj is also enabled and used and if the
thrown exception has a possiblity being a longjmp instead of an
exception, we shouldn't swallow it; we should rethrow, or relay it. It
was done in D106525 and the code is here:
8441a8eea8/llvm/lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp (L858-L898)

Here is the pseudocode of that part: (copied from comments)
```
if (%__THREW__.val == 0 || %__THREW__.val == 1)
  goto %tail
else
  goto %longjmp.rethrow

longjmp.rethrow: ;; This is longjmp. Rethrow it
  %__threwValue.val = __threwValue
  emscripten_longjmp(%__THREW__.val, %__threwValue.val);

tail: ;; Nothing happened or an exception is thrown
  ... Continue exception handling ...
```

If the current BB (where the `invoke` is created) has successors that
has the current BB as its PHI incoming node, now that has to change to
`tail` in the pseudocode, because `tail` is the latest BB that is
connected with the next BB, but this was missing.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D108785
2021-08-26 17:25:26 -07:00
Matt Arsenault ce51c5d4a9 AMDGPU: Fix crashing on kernel declarations when lowering LDS
This was trying to insert the used marker into a declaration.
2021-08-26 19:01:10 -04:00
Yonghong Song 1bebc31c61 [DebugInfo] generate btf_tag annotations for func parameters
Generate btf_tag annotations for function parameters.
A field "annotations" is introduced to DILocalVariable, and
annotations are represented as an DINodeArray, similar to
DIComposite elements. The following example illustrates how
annotations are encoded in IR:
    distinct !DILocalVariable(name: "info",, arg: 1, ..., annotations: !10)
    !10 = !{!11, !12}
    !11 = !{!"btf_tag", !"a"}
    !12 = !{!"btf_tag", !"b"}

Differential Revision: https://reviews.llvm.org/D106620
2021-08-26 14:18:30 -07:00
Kirill Stoimenov a3f4139626 [asan] Implemented flag to emit intrinsics to optimize ASan callbacks.
Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D108377
2021-08-26 20:33:57 +00:00
Kirill Stoimenov 2e83a0efb9 [asan] Fixed a runtime crash.
Looks like the NoRegister has some effect on the final code that is generated. My guess is that some optimization kicks in at the end?

When I use -S to dump the assembly I get the correct version with 'shrq    $3, %r8':
        movq    %r9, %r8
        shrq    $3, %r8
        movsbl  2147450880(%r8), %r8d

But, when I disassemble the final binary I get RAX in stead of R8:
        mov    %r9,%r8
        shr    $0x3,%rax
        movsbl 0x7fff8000(%r8),%r8d

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D108745
2021-08-26 20:30:25 +00:00
Alexey Bataev 84cbd71c95 [SLP]Improve graph reordering.
Reworked reordering algorithm. Originally, the compiler just tried to
detect the most common order in the reordarable nodes (loads, stores,
extractelements,extractvalues) and then fully rebuilding the graph in
the best order. This was not effecient, since it required an extra
memory and time for building/rebuilding tree, double the use of the
scheduling budget, which could lead to missing vectorization due to
exausted scheduling resources.

Patch provide 2-way approach for graph reodering problem. At first, all
reordering is done in-place, it doe not required tree
deleting/rebuilding, it just rotates the scalars/orders/reuses masks in
the graph node.

The first step (top-to bottom) rotates the whole graph, similarly to the previous
implementation. Compiler counts the number of the most used orders of
the graph nodes with the same vectorization factor and then rotates the
subgraph with the given vectorization factor to the most used order, if
it is not empty. Then repeats the same procedure for the subgraphs with
the smaller vectorization factor. We can do this because we still need
to reshuffle smaller subgraph when buildiong operands for the graph
nodes with lasrger vectorization factor, we can rotate just subgraph,
not the whole graph.

The second step (bottom-to-top) scans through the leaves and tries to
detect the users of the leaves which can be reordered. If the leaves can
be reorder in the best fashion, they are reordered and their user too.
It allows to remove double shuffles to the same ordering of the operands in
many cases and just reorder the user operations instead. Plus, it moves
the final shuffles closer to the top of the graph and in many cases
allows to remove extra shuffle because the same procedure is repeated
again and we can again merge some reordering masks and reorder user nodes
instead of the operands.

Also, patch improves cost model for gathering of loads, which improves
x264 benchmark in some cases.

Gives about +2% on AVX512 + LTO (more expected for AVX/AVX2) for {625,525}x264,
+3% for 508.namd, improves most of other benchmarks.
The compile and link time are almost the same, though in some cases it
should be better (we're not doing an extra instruction scheduling
anymore) + we may vectorize more code for the large basic blocks again
because of saving scheduling budget.

Differential Revision: https://reviews.llvm.org/D105020
2021-08-26 12:31:18 -07:00
Andrea Di Biagio 44a13f33be Revert "[MCA][NFC] Remove redundant calls to std::move."
This reverts commit 9cc0023fb8.
due to buildbot failures.
2021-08-26 19:53:17 +01:00
Andrea Di Biagio 9cc0023fb8 [MCA][NFC] Remove redundant calls to std::move.
This fixes some redundant move in return statement [-Wredundant-move] gcc 9.3.0
warnings.

This also fixes a minor coverity issue reported agaist class MCAOperand about
the lack of proper initialization for field Index.

No functional change intended.
2021-08-26 19:47:59 +01:00
Jessica Paquette 2363a20001 [AArch64][GlobalISel] Optimize G_BUILD_VECTOR of undef + 1 elt -> SUBREG_TO_REG
This pattern

```
%elt = ... something ...
%undef = G_IMPLICIT_DEF
%vec = G_BUILD_VECTOR %elt, %undef, %undef, ... %undef
```

Can be selected to a SUBREG_TO_REG, assuming `%elt` and `%vec` have the same
register bank. We don't care about any of the bits in `%vec` aside from those
in `%elt`, which just happens to be the 0th element.

This is preferable to emitting `mov` instructions for every index.

This gives minor code size improvements on the test suite at -Os.

Differential Revision: https://reviews.llvm.org/D108773
2021-08-26 11:45:11 -07:00
Andrea Di Biagio 1eb75362c9 [MCA][RegisterFile] Consistently update the PRF in the presence of multiple writes to the same register.
My last change to the RegisterFile (PR51495) has introduced a bug in the logic
that allocates physical registers in the PRF.

In some cases, this bug could have triggered a nasty unsigned wrap in the number
of allocated registers, thus resulting in mca being stuck forever in a loop of
PRF availability checks.
2021-08-26 19:16:20 +01:00
Craig Topper 1b9417454e [RISCV] Insert a sext_inreg when type legalizing i32 shl by constant on RV64.
Similar to what we do for add/sub/mul.

This can help remove some sext.w. There are some regressions on
some bswap tests, but I have an idea how to fix that for a follow up.

A new PACKW pattern is added to handle the new sext_inreg placement.

Differential Revision: https://reviews.llvm.org/D108663
2021-08-26 10:20:19 -07:00
Yonghong Song 30c288489a [DebugInfo] generate btf_tag annotations for DIGlobalVariable
Generate btf_tag annotations for DIGlobalVariable.
A field "annotations" is introduced to DIGlobalVariable, and
annotations are represented as an DINodeArray, similar to
DIComposite elements. The following example illustrates how
annotations are encoded in IR:
    distinct !DIGlobalVariable(..., annotations: !10)
    !10 = !{!11, !12}
    !11 = !{!"btf_tag", !"a"}
    !12 = !{!"btf_tag", !"b"}

Differential Revision: https://reviews.llvm.org/D106619
2021-08-26 10:03:44 -07:00
Andrew Litteken 9d2c859ebb [CodeExtractor] Making the arguments outlined easier to access from the outside
The Code Extractor does not provide an easy mechanism for determining the
inputs and outputs after extraction has occurred, this patch gives the
ability to pass in empty SetVectors to be filled with the inputs and
outputs if they need to be analyzed.

Added Tests:
- InputOutputMonitoring in unittests/Transforms/Utils/CodeExtractorTests.cpp

Reviewers: paquette

Differential Revision: https://reviews.llvm.org/D106991
2021-08-26 09:47:53 -07:00
Stanislav Mekhanoshin 827dd17e26 [AMDGPU] Invert partial vgpr to agpr spill lane order
On targets requiring VGPR alignment we may end up spilling an
unaligned register if we were partially spilled odd number of
leading lanes. The reminder will start with an odd register.

This problem is solved by inverting the order of lanes to
be spillied so that we start from the end.

Differential Revision: https://reviews.llvm.org/D108732
2021-08-26 09:39:03 -07:00
Craig Topper 8bb24289f3 [SelectionDAG] Optimize bitreverse expansion to minimize the number of mask constants.
We can halve the number of mask constants by masking before shl
and after srl.

This can reduce the number of mov immediate or constant
materializations. Or reduce the number of constant pool loads
for X86 vectors.

I think we might be able to do something similar for bswap. I'll
look at it next.

Differential Revision: https://reviews.llvm.org/D108738
2021-08-26 09:33:24 -07:00
Alexey Bataev b00f73d8bf Revert "[SLP]Improve graph reordering."
This reverts commit a28234e37a to
investigate a compiler crash caused by the commit.
2021-08-26 09:19:40 -07:00
Kazu Hirata cce49dcb85 [IR] Remove addPseudoProbeAttribute (NFC)
The last use was removed on Jun 17, 2021 in commit
bd52495518.
2021-08-26 09:02:26 -07:00
Roman Lebedev a8125bf4a8
[X86][Codegen] PR51615: don't replace wide volatile load with narrow broadcast-from-memory
Even though https://bugs.llvm.org/show_bug.cgi?id=51615
appears to be introduced by D105390, the fix lies here.

We can not replace a wide volatile load with a broadcast-from-memory,
because that would narrow the load, which isn't legal for volatiles.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D108757
2021-08-26 18:46:49 +03:00
Anna Thomas 55bdb14026 [LoopPredication] Preserve MemorySSA
Since LICM has now unconditionally moved to MemorySSA based form, all
passes that run in same LPM as LICM need to preserve MemorySSA (i.e. our
downstream pipeline).

Added loop-mssa to all tests and perform -verify-memoryssa within
LoopPredication itself.

Differential Revision: https://reviews.llvm.org/D108724
2021-08-26 11:36:25 -04:00
Yonghong Song d383df32c0 [DebugInfo] generate btf_tag annotations for DISubprogram types
Generate btf_tag annotations for DISubprogram types.
A field "annotations" is introduced to DISubprogram, and
annotations are represented as an DINodeArray, similar to
DIComposite elements. The following example illustrates how
annotations are encoded in IR:
    distinct !DISubprogram(..., annotations: !10)
    !10 = !{!11, !12}
    !11 = !{!"btf_tag", !"a"}
    !12 = !{!"btf_tag", !"b"}

Differential Revision: https://reviews.llvm.org/D106618
2021-08-26 08:24:19 -07:00
Andrew Wei c9066c5d37 [CGP] Fix the crash for combining address mode when having cyclic dependency
In the combination of addressing modes, when replacing the matched phi nodes,
sometimes the phi node to be replaced has been modified. For example,
there’s matcher set [A, B] and [C, A], which will have cyclic dependency:
A is replaced by B and C will be replaced by A. Because we tried to match new phi node
to another new phi node, we should ignore new phi nodes when mapping new phi node to old one.

Reviewed By: skatkov

Differential Revision: https://reviews.llvm.org/D108635
2021-08-26 22:52:42 +08:00
Jacob Bramley 05f3219b38 [AArch64] Lower fpto*i.sat intrinsics for NEON.
Following on from D102353, extend the fpto*i.sat intrinsics to use NEON
fcvt* instructions.

Differential Revision: https://reviews.llvm.org/D108460
2021-08-26 15:37:00 +01:00
Alexey Bataev a28234e37a [SLP]Improve graph reordering.
Reworked reordering algorithm. Originally, the compiler just tried to
detect the most common order in the reordarable nodes (loads, stores,
extractelements,extractvalues) and then fully rebuilding the graph in
the best order. This was not effecient, since it required an extra
memory and time for building/rebuilding tree, double the use of the
scheduling budget, which could lead to missing vectorization due to
exausted scheduling resources.

Patch provide 2-way approach for graph reodering problem. At first, all
reordering is done in-place, it doe not required tree
deleting/rebuilding, it just rotates the scalars/orders/reuses masks in
the graph node.

The first step (top-to bottom) rotates the whole graph, similarly to the previous
implementation. Compiler counts the number of the most used orders of
the graph nodes with the same vectorization factor and then rotates the
subgraph with the given vectorization factor to the most used order, if
it is not empty. Then repeats the same procedure for the subgraphs with
the smaller vectorization factor. We can do this because we still need
to reshuffle smaller subgraph when buildiong operands for the graph
nodes with lasrger vectorization factor, we can rotate just subgraph,
not the whole graph.

The second step (bottom-to-top) scans through the leaves and tries to
detect the users of the leaves which can be reordered. If the leaves can
be reorder in the best fashion, they are reordered and their user too.
It allows to remove double shuffles to the same ordering of the operands in
many cases and just reorder the user operations instead. Plus, it moves
the final shuffles closer to the top of the graph and in many cases
allows to remove extra shuffle because the same procedure is repeated
again and we can again merge some reordering masks and reorder user nodes
instead of the operands.

Also, patch improves cost model for gathering of loads, which improves
x264 benchmark in some cases.

Gives about +2% on AVX512 + LTO (more expected for AVX/AVX2) for {625,525}x264,
+3% for 508.namd, improves most of other benchmarks.
The compile and link time are almost the same, though in some cases it
should be better (we're not doing an extra instruction scheduling
anymore) + we may vectorize more code for the large basic blocks again
because of saving scheduling budget.

Differential Revision: https://reviews.llvm.org/D105020
2021-08-26 07:19:07 -07:00
Simon Pilgrim c17f5afa88 [X86] getShape - don't dereference dyn_cast<>
dyn_cast can return nullptr, use cast<> to assert we have the correct type.
2021-08-26 15:08:13 +01:00
Simon Pilgrim 47f2affa08 Fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warning. NFCI. 2021-08-26 15:08:12 +01:00
Andrew Wei 99c4336374 [LoopDataPrefetch] Add missed LoopSimplify dependence for prefetch pass
SCEVExpander::expandCodeFor may expand add recurrences for loop with a preheader,
so we should make LoopDataPrefetch dependent on LoopSimplify.
This patch will try to fix : https://bugs.llvm.org/show_bug.cgi?id=43784

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D108448
2021-08-26 21:01:59 +08:00
Jessica Clarke 8f89e2f6c9 [AMDGPU] Remove dead and broken ComplexPatterns
SelectADDRParam was discovered as being dead 5 years ago and removed in
7b4ef068c6 but the unused ComplexPattern definition was left behind.
SelectADDRDWord has never existed as far as I can tell, even back when
AMDGPU was R600-only and called that.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D108758
2021-08-26 12:48:32 +01:00
Andrea Di Biagio 4a5b191703 [X86][MCA] Address the latest issues with MULX reported in PR51495.
It turns out that SchedWrite WriteIMulH was always assigned to the low half of
the result of a MULX (rather than to the high half).

To avoid confusion, this patch swaps the two MULX writes in the tablegen
definition of MULX32/64.  That way, write names better describe what they
actually refer to; this also avoids further complications if in future we decide
to reuse the same MulH writes to also model other scalar integer multiply
instructions.  I also had to swap the latency values for the two MULX writes to
make sure that the change is effectively an NFC. In fact, none of the existing
x86 tests were affected by this small refactoring.

This patch also fixes a bug in MCA: a wrong latency value was propagated for
instructions that perform multiple writes to a same register.  This last issue
was found by Roman while testing MULX on targets that define a different latency
for the Low/High part of the result.

Differential Revision: https://reviews.llvm.org/D108727
2021-08-26 12:08:20 +01:00
Matthew Devereau 9b830c798e [AArch64][SVE] Teach cost model masked gathers/scatters are cheap
Tell the cost model to use the scalable calculation for non-neon fixed vector.
This results in a cheaper cost for fixed-length SVE masked gathers/scatters
allowing the vectorizor to emit them more frequently.
2021-08-26 11:17:47 +01:00
Florian Hahn aa5b6c9779
[ConstraintElimination] Initial support for using info from assumes.
This patch adds initial support to use facts from @llvm.assume calls. It
intentionally does not handle all possible cases to keep things simple
initially.

For now, the condition from an assume is made available on entry to the
containing block, if the assume is guaranteed to execute. Otherwise it
is only made available in the successor blocks.
2021-08-26 10:08:00 +01:00
David Green 6ffc6951a3 [AArch64] Remove unpredictable from narrowing instructions.
Like other similar instructions the xtn2 family do not have side
effects, and explicitly marking them as such can help improve scheduling
freedom.
2021-08-26 09:43:44 +01:00
Jay Foad 985eb25546 [MachineScheduler] Fix tracing
Consistently print a newline before "RegionInstrs:".
2021-08-26 09:27:01 +01:00
Esme-Yi b21ed75e10 [llvm-readobj][XCOFF] Add support for `--needed-libs` option.
Summary: This patch is trying to add support for llvm-readobj
--needed-libs option under XCOFF.
For XCOFF, the needed libraries can be found from the Import
File ID Name Table of the Loader Section.
Currently, I am using binary inputs in the test since yaml2obj
does not yet support for writing the Loader Section and the
import file table.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D106643
2021-08-26 07:17:06 +00:00
Wenlei He a45d72e024 [CSSPGO] Add switch for sample loader to honor global pre-inliner decision from llvm-profgen
The change adds a switch to allow sample loader to use global pre-inliner's decision instead. The pre-inliner in llvm-profgen makes inline decision globally based on whole program profile and function byte size as cost proxy.

Since pre-inliner also adjusts/merges context profile based on its inline decision, honoring its inline decision in sample loader would lead to better post-inline profile quality especially for thinlto where cross module profile merging isn't possible without pre-inliner.

Minor fix in profile reader is also included. When pre-inliner is use, we now also turn off the default merging and trimming logic unless it's explicitly asked.

Differential Revision: https://reviews.llvm.org/D108677
2021-08-25 17:20:15 -07:00
Heejin Ahn e849d99df1 [WebAssembly] Use entry block only for initializations in EmSjLj
Emscripten SjLj transformation is done in four steps. This will be
mostly the same for the soon-to-be-added Wasm SjLj; the step 1, 3, and 4
will be shared and there will be separate way of doing step 2.
1. Initialize `setjmpTable` and `setjmpTableSize` in the entry BB
2. Handle `setjmp` callsites
3. Handle `longjmp` callsites
4. Cleanup and update SSA

We initialize `setjmpTable` and `setjmpTableSize` in the entry BB. But
if the entry BB contains a `setjmp` call, some `setjmp` handling
transformation will also happen in the entry BB, such as calling
`saveSetjmp`.

This is fine for Emscripten SjLj but not for Wasm SjLj, because in Wasm
SjLj we will add a dispatch BB that contains a `switch` right after the
entry BB, from which we jump to one of post-`setjmp` BBs. And this
dispatch BB should precede all `setjmp` calls.

Emscripten SjLj (current):
```
entry:
  %setjmpTable = ...
  %setjmpTableSize = ...
  ...
  call @saveSetjmp(...)
```

Wasm SjLj (follow-up):
```
entry:
  %setjmpTable = ...
  %setjmpTableSize = ...

setjmp.dispatch:
  ...
  ; Jump to the right post-setjmp BB, if we are returning from a
  ; longjmp. If this is the first setjmp call, go to %entry.split.
  switch i32 %no, label %entry.split [
    i32 1, label %post.setjmp1
    i32 2, label %post.setjmp2
    ...
    i32 N, label %post.setjmpN
  ]

entry.split:
  ...
  call @saveSetjmp(...)
```

So in Wasm SjLj we split the entry BB to make the entry block only for
`setjmpTable` and `setjmpTableSize` initialization and insert a
`setjmp.dispatch` BB. (This part is not in this CL. This will be a
follow-up.) But note that Emscripten SjLj and Wasm SjLj share all
steps except for the step 2. If we only split the entry BB only for Wasm
SjLj, there will be one more `if`-`else` and the code will be more
complicated.

So this CL splits the entry BB in Emscripten SjLj and put only
initialization stuff there as follows:
Emscripten SjLj (this CL):
```
entry:
  %setjmpTable = ...
  %setjmpTableSize = ...
  br %entry.split

entry.split:
  ...
  call @saveSetjmp(...)
```
This is just done to share code with Wasm SjLj. It adds an unnecessary
branch but this will be removed in later optimization passes anyway.

This is in effect NFC, meaning the program behavior will not change, but
existing ll tests files have changed because the entry block was split.
The reason I upload this in a separate CL is to make the Wasm SjLj diff
tidier, because this changes many existing Emscripten SjLj tests, which
can be confusing for the follow-up Wasm SjLj CL.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D108729
2021-08-25 15:46:57 -07:00
Heejin Ahn 2f88a30ca6 [WebAssembly] Extract longjmp handling in EmSjLj to a function (NFC)
Emscripten SjLj and (soon-to-be-added) Wasm SjLj transformation share
many steps:
1. Initialize `setjmpTable` and `setjmpTableSize` in the entry BB
2. Handle `setjmp` callsites
3. Handle `longjmp` callsites
4. Cleanup and update SSA

1, 3, and 4 are identical for Emscripten SjLj and Wasm SjLj. Only the
step 2 is different. This CL extracts the current Emscripten SjLj's
longjmp callsites handling into a function. The reason to make this a
separate CL is, without this, the diff tool cannot compare things well
in the presence of moved code and added code in the followup Wasm SjLj
CL, and it ends up mixing them together, making the diff unreadable.

Also fixes some typos and variable names. So far we've been calling the
buffer argument to `setjmp` and `longjmp` `jmpbuf`, but the name used in
the man page for those functions is `env`, so updated them to be
consistent.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D108728
2021-08-25 15:45:38 -07:00
Ricky Taylor f659b6b1fa [M68k][NFC] Rename M68kOperand::Kind to KindTy
Rename the M68kOperand::Type enumeration to KindTy to avoid ambiguity
with the Kind field when referencing enumeration values e.g.
`Kind::Value`.

This works around a compilation error under GCC 5, where GCC won't
lookup enum class values if you have a similarly named field
(see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60994).

The error in question is:
`M68kAsmParser.cpp:857:8: error: 'Kind' is not a class, namespace, or enumeration`

Differential Revision: https://reviews.llvm.org/D108723
2021-08-25 22:24:43 +01:00
Heejin Ahn c2c9a3fd9c [WebAssembly] Rename wasm.catch.exn intrinsic back to wasm.catch
The plan was to use `wasm.catch.exn` intrinsic to catch exceptions and
add `wasm.catch.longjmp` intrinsic, that returns two values (setjmp
buffer and return value), later to catch longjmps. But because we
decided not to use multivalue support at the moment, we are going to use
one intrinsic that returns a single value for both exceptions and
longjmps. And even if it's not for that, I now think the naming of
`wasm.catch.exn` is a little weird, because the intrinsic can still take
a tag immediate, which means it can be used for anything, not only
exceptions, as long as that returns a single value.

This partially reverts D107405.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D108683
2021-08-25 14:19:22 -07:00
Sanjay Patel e728d1a3e8 [DAGCombiner] create binop nodes with all of expected values
This is another bug exposed by https://llvm.org/PR51612
(and the one that triggered the initial assertion) in the report.

That example was suppressed with:
985b48f183

...but these would still crash because we created nodes
like UADDO without the expected 2 output values.
2021-08-25 16:14:22 -04:00
Patrick Holland fe01014faa [MCA] Moved View.h and View.cpp from /tools/llvm-mca/ to /lib/MCA/.
Moved View.h and View.cpp from /tools/llvm-mca/Views/ to /lib/MCA/ and
/include/llvm/MCA/. This is so that targets can define their own Views within
the /lib/Target/ directory (so that the View can use backend functionality).
To enable these Views within mca, targets will need to add them to the vector of
Views returned by their target's CustomBehaviour::getViews() methods.

Differential Revision: https://reviews.llvm.org/D108520
2021-08-25 12:12:47 -07:00
Sanjay Patel 985b48f183 [DAGCombiner] check uses more strictly on select-of-binop fold
There are 2 bugs here:
1. We were not checking uses of operand 2 (the false value of the select).
2. We were not checking for multiple uses of nodes that produce >1 result.

Correcting those is enough to avoid the crash in the reduced test based on:
https://llvm.org/PR51612

The additional use check on operand 0 (the condition value of the select)
should not strictly be necessary because we are only replacing one use
with another (whether it makes performance sense to do the transform with
that pattern is not clear). But as noted in the TODO, changing that
uncovers another bug.

Note: there's at least one more bug here - we aren't propagating EVTs
correctly, but I plan to fix that in another patch.
2021-08-25 14:14:41 -04:00
Nick Desaulniers 846e562dcc [Clang] add support for error+warning fn attrs
Add support for the GNU C style __attribute__((error(""))) and
__attribute__((warning(""))). These attributes are meant to be put on
declarations of functions whom should not be called.

They are frequently used to provide compile time diagnostics similar to
_Static_assert, but which may rely on non-ICE conditions (ie. relying on
compiler optimizations). This is also similar to diagnose_if function
attribute, but can diagnose after optimizations have been run.

While users may instead simply call undefined functions in such cases to
get a linkage failure from the linker, these provide a much more
ergonomic and actionable diagnostic to users and do so at compile time
rather than at link time. Users instead may be able use inline asm .err
directives.

These are used throughout the Linux kernel in its implementation of
BUILD_BUG and BUILD_BUG_ON macros. These macros generally cannot be
converted to use _Static_assert because many of the parameters are not
ICEs. The Linux kernel still needs to be modified to make use of these
when building with Clang; I have a patch that does so I will send once
this feature is landed.

To do so, we create a new IR level Function attribute, "dontcall" (both
error and warning boil down to one IR Fn Attr).  Then, similar to calls
to inline asm, we attach a !srcloc Metadata node to call sites of such
attributed callees.

The backend diagnoses these during instruction selection, while we still
know that a call is a call (vs say a JMP that's a tail call) in an arch
agnostic manner.

The frontend then reconstructs the SourceLocation from that Metadata,
and determines whether to emit an error or warning based on the callee's
attribute.

Link: https://bugs.llvm.org/show_bug.cgi?id=16428
Link: https://github.com/ClangBuiltLinux/linux/issues/1173

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D106030
2021-08-25 10:34:18 -07:00
Nathan Sidwell ab55cc6cef [X86] pr51000 in-register struct return tailcalling
In-register structure returns are not special, and handled by lowering
to multiple-value tuples.  We can tail-call from non-sret fns to
structure-returning functions, except on i686 where the sret pointer
is callee-pop.

Differential Revision: https://reviews.llvm.org/D105807
2021-08-25 10:15:50 -07:00
Stanislav Mekhanoshin 11b7ee974a [AMDGPU] Avoid assert for saved FP
With spilling into AGPRs enabled we cannot reliably predict
if we need to save FP or not. We may finally spill everything
into AGPRs and never touch stack. In this case we still may
save FP. This is deficiency but not an error, so avoid the
assert.

Differential Revision: https://reviews.llvm.org/D107404
2021-08-25 09:50:59 -07:00
Alexey Bataev a36bc873a2 [SLP]No need to schedule/check parent for extract{element/value} instruction.
The instruction extractelement/extractvalue are not required to
be scheduled since they only depend on the source vector/aggregate (with
constant indices), smae applies to the parent basic block checks.
Improves compile time and saves scheduling budget.

Differential Revision: https://reviews.llvm.org/D108703
2021-08-25 09:27:55 -07:00
Rong Xu 24201b6437 [SampleFDO] Set ProfileIsFS bit properly from the internal option
We have "-profile-isfs" internal option for text, binary, and
compactbinary format (mostly for debug and test purpose). We
need to set the related flag in FunctionSamples so that ProfileIsFS
is written to the header in extbinary format.

Differential Revision: https://reviews.llvm.org/D108707
2021-08-25 09:07:34 -07:00
Wenlei He a6f15e9a49 [CSSPGO] Use probe inline tree to track zero size fully optimized context for pre-inliner
This is a follow up diff for BinarySizeContextTracker to track zero size for fully optimized inlinee. When an inlinee is fully optimized away, we won't be able to get its size through symbolizing instructions, hence we will treat the corresponding context size as unknown. However by traversing the inlined probe forest, we know what're original inlinees regardless of optimization. If a context show up in inlined probes, but not during symbolization, we know that it's fully optimized away hence its size is zero instead of unknown. It should provide more accurate size cost estimation for pre-inliner to make better inline decisions in llvm-profgen.

Differential Revision: https://reviews.llvm.org/D108350
2021-08-25 09:01:11 -07:00
Kirill Stoimenov 832aae738b [asan] Implemented intrinsic for the custom calling convention similar used by HWASan for X86.
The implementation uses the int_asan_check_memaccess intrinsic to instrument the code. The intrinsic is replaced by a call to a function which performs the access check. The generated function names encode the input register name as a number using Reg - X86::NoRegister formula.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D107850
2021-08-25 15:31:46 +00:00
alex-t ed0f4415f0 [AMDGPU] Divergence-driven compare operations instruction selection
Description: This change enables the compare operations to be selected to SALU/VALU form
             dependent of the SDNode divergence flag.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D106079
2021-08-25 18:30:49 +03:00
Neumann Hon 6b94777be5 [SystemZ] [NFC] Replace SpecialRegisters field with a unique_ptr instead of a raw pointer.
This patch replaces the SpecialRegisters field with a unique_ptr instead of a raw pointer. This is better practice, and allows us to remove the definition of the dtor for the SystemZSubtarget class.

Reviewed By: uweigand, Kai

Differential Revision: https://reviews.llvm.org/D108639
2021-08-25 11:28:18 -04:00
Andrea Di Biagio 5f848b311f [X86][SchedModel] Fix latency the Hi register write of MULX (PR51495).
Before this patch, WriteIMulH reported a latency value which is correct for the
RR variant of MULX, but not for the RM variant.

This patch fixes the issue by introducing a new WriteIMulHLd, which is meant to
be used only by the RM variant of MULX.

Differential Revision: https://reviews.llvm.org/D108701
2021-08-25 16:12:09 +01:00
Vyacheslav Zakharin 2e192ab1f4 [CodeExtractor] Preserve topological order for the return blocks.
Differential Revision: https://reviews.llvm.org/D108673
2021-08-25 08:09:01 -07:00
Thomas Johnson 8c3886b0ec [ARC] Add ADC (addition with carry) and SBC (subtraction with carry) instructions
Differential Revision: https://reviews.llvm.org/D108672
2021-08-25 07:46:15 -07:00
Nicholas Guy 36fcf47fc8 [AArch64] Generate SMOV in place of sext(fmov(...))
A single smov instruction is capable of moving from a vector register while performing
the sign-extend during said move, rather than each step being performed by separate instructions.

Differential Revision: https://reviews.llvm.org/D108633
2021-08-25 15:23:22 +01:00
Jeremy Morse 0116ed0069 [DebugInfo][InstrRef] Don't use instr-ref for unoptimised functions
InstrRefBasedLDV is marginally slower than VarlocBasedLDV when analysing
optimised code -- however, it's much slower when analysing code compiled
-O0.

To avoid this: don't use instruction referencing for -O0 functions. In the
"pure" case of unoptimised code, this won't really harm the debugging
experience because most variables won't have been promoted off the stack,
so can't go missing. It becomes more complicated when optimised code is
inlined into functions marked optnone; however these are rare, and as -O0
doesn't run many optimisations there should be little damage to the debug
experience as a result.

I've taken the opportunity to refactor testing for instruction-referencing
into a MachineFunction method, which seems the most appropriate place to
put it.

Differential Revision: https://reviews.llvm.org/D108585
2021-08-25 15:10:36 +01:00
Joe Nash e381833ba5 [AMDGPU] Support global_atomic_fmin/max on gfx10
Makes patterns added for gfx90a usable with the gfx10 versions of the
insts.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D108654

Change-Id: I86167bf6b4823f975f74ccb619bd6190331ba16b
2021-08-25 09:35:10 -04:00
Florian Hahn 90d09eb300
[LoopPeel] Allow peeling with multiple unreachable-terminated exit blocks.
Support for peeling with multiple exit blocks was added in D63921/77bb3a486fa6.

So far it has only been enabled for loops where all non-latch exits are
'de-optimizing' exits (D63923). But peeling of multi-exit loops can be
highly beneficial in other cases too, like if all non-latch exiting
blocks are unreachable.

The motivating case are loops with runtime checks, like the C++ example
below. The main issue preventing vectorization is that the invariant
accesses to load the bounds of B is conditionally executed in the loop
and cannot be hoisted out. If we peel off the first iteration, they
become dereferenceable in the loop, because they must execute before the
loop is executed, as all non-latch exits are terminated with
unreachable. This subsequently allows hoisting the loads and runtime
checks out of the loop, allowing vectorization of the loop.

     int sum(std::vector<int> *A, std::vector<int> *B, int N) {
       int cost = 0;
       for (int i = 0; i < N; ++i)
         cost += A->at(i) + B->at(i);
       return cost;
     }

This gives a ~20-30% increase of score for Geekbench5/HDR on AArch64.

Note that this requires a follow-up improvement to the peeling cost
model to actually peel iterations off loops as above. I will share that
shortly.

Also, peeling of multi-exits might be beneficial for exit blocks with
other terminators, but I would like to keep the scope limited to known
high-reward cases for now.

I removed the option to disable peeling for multi-deopt exits because
the code is more general now. Alternatively, the option could also be
generalized, but I am not sure if there's much value in the option?

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D108108
2021-08-25 13:26:40 +01:00
Dawid Jurczak bdcf04246c [LoopIdiom] Don't transform loop into memmove when load from body has more than one use
This change fixes issue found by Markus: https://reviews.llvm.org/rG11338e998df1
Before this patch following code was transformed to memmove:

for (int i = 15; i >= 1; i--) {
  p[i] = p[i-1];
  sum += p[i-1];
}

However load from p[i-1] is used not only by store to p[i] but also by sum computation.
Therefore we cannot emit memmove in loop header.

Differential Revision: https://reviews.llvm.org/D107964
2021-08-25 14:22:40 +02:00
Peilin Guo 4c4dbeeeea [DAGCombine] Check the legality of the index of EXTRACT_SUBVECTOR
For ISD::EXTRACT_SUBVECTOR, its second operand must be a constant
multiple of the known-minimum vector length of the result type.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D107795
2021-08-25 19:33:39 +08:00
Jeremy Morse cc1e87bf55 [DebugInfo][InstrRef] Avoid stack-slot-coloring changing codegen due to DI
Stack slot colouring adds "weight" to slots if a non-dbg-value instruction
refers to it. This, unfortunately, means that DBG_PHI instructions can have
an effect on codegen. The fix is very simple, replace isDebugValue with
isDebugInstr.

The regression test contains a scenario that reproduces this problem; I've
represented both normal-debug mode and instr-ref debug mode instructions
in comment lines prefixed with AAAAAA and BBBBBB, and un-comment them with
sed to test that the two different modes produce the same behaviour.

Differential Revision: https://reviews.llvm.org/D108627
2021-08-25 12:04:59 +01:00
Rosie Sumpter e221724714 [LoopFlatten] Add statistic for number of loops flattened. NFC
Differential Revision: https://reviews.llvm.org/D108644
2021-08-25 10:10:10 +01:00
Daniil Fukalov 48958d02d2 [NFC][AMDGPU] Reduce includes dependencies.
1. Splitted out some parts of R600 target to separate modules/headers.
2. Reduced some include lists in headers.
3. Found and fixed issue with override `GCNTargetMachine::getSubtargetImpl()`
   and `R600TargetMachine::getSubtargetImpl()` had different return value type
   than base class.
4. Minor forward declarations cleanup.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D108596
2021-08-25 12:01:55 +03:00
Konstantin Schwarz 4b4bc1ea16 [GlobalISel] Do not generate illegal G_SEXTLOADs after legalization
The sext_inreg_of_load combine did not have the isLegalOrBeforeLegalizer check,
leading to the generation of potentially illegal G_SEXTLOADs when run after legalization.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D108626
2021-08-25 10:13:39 +02:00
Vang Thao 549f6a819a [MachineCopyPropagation] Check CrossCopyRegClass for cross-class copys
On some AMDGPU subtargets, copying to and from AGPR registers using another
AGPR register is not possible. A intermediate VGPR register is needed for AGPR
to AGPR copy. This is an issue when machine copy propagation forwards a
COPY $agpr, replacing a COPY $vgpr which results in $agpr = COPY $agpr. It is
removing a cross class copy that may have been optimized by previous passes and
potentially creating an unoptimized cross class copy later on.

To avoid this issue, check CrossCopyRegClass if a different register class will
be needed for the copy. If so then avoid forwarding the copy when the
destination does not match the desired register class and if the original copy
already matches the desired register class.

Issue seen while attempting to optimize another AGPR to AGPR issue:

Live-ins: $agpr0
$vgpr0 = COPY $agpr0
$agpr1 = V_ACCVGPR_WRITE_B32 $vgpr0
$agpr2 = COPY $vgpr0
$agpr3 = COPY $vgpr0
$agpr4 = COPY $vgpr0

After machine-cp:

$vgpr0 = COPY $agpr0
$agpr1 = V_ACCVGPR_WRITE_B32 $vgpr0
$agpr2 = COPY $agpr0
$agpr3 = COPY $agpr0
$agpr4 = COPY $agpr0

Machine-cp propagated COPY $agpr0 to replace $vgpr0 creating 3 AGPR to AGPR
copys. Later this creates a cross-register copy from AGPR->VGPR->AGPR for each
copy when the prior VGPR->AGPR copy was already optimal.

Reviewed By: lkail, rampitec

Differential Revision: https://reviews.llvm.org/D108011
2021-08-24 21:22:36 -07:00
Lang Hames 2a35d59b2f [JITLink][MachO] Add more detail to error message. 2021-08-25 13:31:52 +10:00
Lang Hames fc3b2675e7 [ORC] Fix typo in debugging output 2021-08-25 13:31:51 +10:00
Fangrui Song 9ab9a9595b [InstrProfiling] Keep profd non-private for non-renamable comdat functions
The NS==0 condition used by D103717 missed a corner case: if the current copy
does not have a hash suffix (e.g. weak_odr), a copy with value profiling (with a
different CFG) may exist. This is super rare, but is possible with pre-inlining
PGO instrumentation (which can make a weak_odr function inlines its callees
differently, sometimes with value profiling while sometimes without).

If the current copy with private profd is prevailing, the non-prevailing copy
may get an undefined symbol if a caller inlining the non-prevailing function
references its profd. If the other copy with non-private profd is prevailing,
the current copy may cause a "relocation to discarded section" linker error.

The fix is straightforward: just keep non-private profd in such a `DataReferencedByCode` case.

With this change, a stage 2 (`-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR`)
clang is 0.08% larger (172431496/172286720-1).
`stat -c %s **/*.o | awk '{s+=$1}END{print s}' is 0.026% larger.
The majority of D103717's benefits remains.

Reviewed By: xur

Differential Revision: https://reviews.llvm.org/D108432
2021-08-24 20:14:03 -07:00
Thomas Lively 977eeb0c38 [WebAssembly] Fix some UB from ca541aa319 2021-08-24 19:44:03 -07:00
Fangrui Song 32e2326cda Revert D108432 "[InstrProfiling] Keep profd non-private for non-renamable comdat functions"
This reverts commit f653beea88.

It broke Windows coverage-inline.cpp because link.exe has a limitation
that external symbols in IMAGE_COMDAT_SELECT_ASSOCIATIVE don't work.

It essentially dropped the previous size optimization for coverage
because coverage doesn't rename comdat by default.
Needs more investigation what we should do.
2021-08-24 19:16:07 -07:00
Heejin Ahn d5244fb160 [WebAssembly] Use SSAUpdaterBulk in LowerEmscriptenSjLj
We update SSA in two steps in Emscripten SjLj:
1. Rewrite uses of `setjmpTable` and `setjmpTableSize` variables and
   place `phi`s where necessary, which are updated where we call
   `saveSetjmp`.
2. Do a whole function level SSA update for all variables, because we
   split BBs where `setjmp` is called and there are possibly variable
   uses that are not dominated by a def.
   (See 955b91c19c/llvm/lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp (L1314-L1324))

We have been using `SSAUpdater` to do this, but `SSAUpdaterBulk` class
was added after this pass was first created, and for the step 2 it looks
like a better alternative with a possible performance benefit. Not sure
the author is aware of it, but `SSAUpdaterBulk` seems to have a
limitation: it cannot handle a use within the same BB as a def but
before it. For example:
```
... = %a + 1
%a = foo();
```
or
```
%a = %a + 1
```
The uses `%a` in RHS should be rewritten with another SSA variable of
`%a`, most likely one generated from a `phi`. But `SSAUpdaterBulk`
thinks all uses of `%a` are below the def of `%a` within the same BB.
(`SSAUpdater` has two different functions of rewriting because of this:
`RewriteUse` and `RewriteUseAfterInsertions`.) This doesn't affect our
usage in the step 2 because that deals with possibly non-dominated uses
by defs after block splitting. But it does in the step 1, which still
uses `SSAUpdater`.

But this CL also simplifies the step 1 by using `make_early_inc_range`,
removing the need to advance the iterator before rewriting a use.

This is NFC; the test changes are just the order of PHI nodes.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D108583
2021-08-24 18:23:51 -07:00
Heejin Ahn 77b921b870 [WebAssembly] Tidy up EH/SjLj options
This CL is small, but the description can be a little long because I'm
trying to sum up the status quo for Emscripten/Wasm EH/SjLj options.

First, this CL adds an option for Wasm SjLj (`-wasm-enable-sjlj`), which
handles SjLj using Wasm EH. The implementation for this will be added as
a followup CL, but this adds the option first to do error checking.

This also adds an option for Wasm EH (`-wasm-enable-eh`), which has been
already implemented. Before we used `-exception-model=wasm` as the same
meaning as enabling Wasm EH, but after we add Wasm SjLj, it will be
possible to use Wasm EH instructions for Wasm SjLj while not enabling
EH, so going forward, to use Wasm EH, `opt` and `llc` will need this
option. This only affects `opt` and `llc` command lines and does not
affect Emscripten user interface.

Now we have two modes of EH (Emscripten/Wasm) and also two modes of SjLj
(also Emscripten/Wasm). The options corresponding to each of are:
- Emscripten EH: `-enable-emscripten-cxx-exceptions`
- Emscripten SjLj: `-enable-emscripten-sjlj`
- Wasm EH: `-wasm-enable-eh -exception-model=wasm`
           `-mattr=+exception-handling`
- Wasm SjLj: `-wasm-enable-sjlj -exception-model=wasm`
             `-mattr=+exception-handling`
The reason Wasm EH/SjLj's options are a little complicated are
`-exception-model` and `-mattr` are common LLVM options ane not under
our control. (`-mattr` can be omitted if it is embedded within the
bitcode file.)

And we have the following rules of the option composition:
- Emscripten EH and Wasm EH cannot be turned on at the same itme
- Emscripten SjLj and Wasm SjLj cannot be turned on at the same time
- Wasm SjLj should be used with Wasm EH

Which means we now allow these combinations:
- Emscripten EH + Emscripten SjLj: the current default in `emcc`
- Wasm EH + Emscripten SjLj:
  This is allowed, but only as an interim step in which we are testing
  Wasm EH but not yet have a working implementation of Wasm SjLj. This
  will error out (D107687) in compile time if `setjmp` is called in a
  function in which Wasm exception is used.
- Wasm EH + Wasm SjLj:
  This will be the default mode later when using Wasm EH. Currently Wasm
  SjLj implementation doesn't exist, so it doesn't work.
- Emscripten EH + Wasm SjLj will not work.

This CL moves these error checking routines to
`WebAssemblyPassConfig::addIRPasses`. Not sure if this is an ideal place
to do this, but I couldn't find elsewhere. Currently some checking is
done within LowerEmscriptenEHSjLj, but these checks only run if
LowerEmscriptenEHSjLj runs so it may not run when Wasm EH is used. This
moves that to `addIRPasses` and adds some more checks.

Currently LowerEmscriptenEHSjLj pass is responsible for Emscripten EH
and Emscripten SjLj. Wasm EH transformations are done in multiple
places, including WasmEHPrepare, LateEHPrepare, and CFGStackify. But in
the followup CL, LowerEmscriptenEHSjLj pass will be also responsible for
a part of Wasm SjLj transformation, because WasmSjLj will also be using
several Emscripten library functions, and we will be sharing more than
half of the transformation to do that between Emscripten SjLj and Wasm
SjLj.

Currently we have `-enable-emscripten-cxx-exceptions` and
`-enable-emscripten-sjlj` but these only work for `llc`, because for
`llc` we feed these options to the pass but when we run the pass using
`opt` the pass will be created with no options and the default options
will be used, which turns both Emscripten EH and Emscripten SjLj on.

Now we have one more SjLj option to care for, LowerEmscriptenEHSjLj pass
needs a finer way to control these options. This CL removes those
default parameters and make LowerEmscriptenEHSjLj pass read directly
from command line options specified. So if we only run
`opt -wasm-lower-em-ehsjlj`, currently both Emscripten EH and Emscripten
SjLj will run, but with this CL, none will run unless we additionally
pass `-enable-emscripten-cxx-exceptions` or `-enable-emscripten-sjlj`,
or both. This does not affect users; this only affects our `opt` tests
because `emcc` will not call either `opt` or `llc`. As a result of this,
our existing Emscripten EH/SjLj tests gained one or both of those
options in their `RUN` lines.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D107685
2021-08-24 17:54:39 -07:00
Shimin Cui cea5ab090b [GlobalOpt] Fix the assert for null check of global value
This is to fix the reported assert - https://bugs.llvm.org/show_bug.cgi?id=51608.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D108674
2021-08-24 20:47:33 -04:00
Thomas Lively ca541aa319 [WebAssembly] Fix up out-of-range BUILD_VECTOR lane constants
Fixes PR51605 in which a DAG combine and legalization sequence generated
out-of-range constants in BUILD_VECTOR lanes. In the v16i8 case, the constants
were 255, which would be in range if DAG ISel used unsigned constants, but it is
out of range because DAG ISel uses signed constants.

Differential Revision: https://reviews.llvm.org/D108669
2021-08-24 17:24:03 -07:00
Amara Emerson 2ed8053d46 Revert "[AArch64][GlobalISel] Don't contract cross-bank copies into truncating stores."
This reverts commit 67bf3ac744.

The reason is that this change is now superseded by 04fb9b729a which fixes the
underlying problem in the selector. Now it's fine to generate truncating FP stores
since the selector code will just generate subreg copies to handle them.
2021-08-24 16:26:56 -07:00
Amara Emerson 04fb9b729a [AArch64][GlobalISel] Fix incorrect handling of fp truncating stores.
When the tablegen patterns fail to select a truncating scalar FPR store,
our manual selection code also failed to handle it silently, trying to
generate an invalid copy. Fix this by adding support in the manual code
to generate a proper subreg copy before selecting a non-truncating store.
2021-08-24 16:07:00 -07:00
Fangrui Song f653beea88 [InstrProfiling] Keep profd non-private for non-renamable comdat functions
The NS==0 condition used by D103717 missed a corner case: if the current copy
does not have a hash suffix (e.g. weak_odr), a copy with value profiling (with a
different CFG) may exist. This is super rare, but is possible with pre-inlining
PGO instrumentation (which can make a weak_odr function inlines its callees
differently, sometimes with value profiling while sometimes without).

If the current copy with private profd is prevailing, the non-prevailing copy
may get an undefined symbol if a caller inlining the non-prevailing function
references its profd. If the other copy with non-private profd is prevailing,
the current copy may cause a "relocation to discarded section" linker error.

The fix is straightforward: just keep non-private profd in this case.

With this change, a stage 2 (`-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR`)
clang is 0.08% larger (172431496/172286720-1).
`stat -c %s **/*.o | awk '{s+=$1}END{print s}' is 0.026% larger.
The majority of D103717's benefits remains.

Reviewed By: xur

Differential Revision: https://reviews.llvm.org/D108432
2021-08-24 15:59:35 -07:00
Philip Reames ec8d87e9f5 [SCEV] Infer nuw from nw for addrecs
This was previously committed in 914836b, and reverted due to confusion on the status of the review.

Differential Revision: https://reviews.llvm.org/D108601
2021-08-24 14:24:05 -07:00
Min-Yih Hsu d2bb6d512c [X86] Add explicit library dependency on LLVMInstrumentation
Patch 9588b685c6 introduced dependency on ASAN. But it didn't
explicitly put LLVMInstrumentation as one of the library dependencies
such that the build will fail if we're building LLVM as shared libraries
(i.e. -DBUILD_SHARED_LIBS=ON).
This patch explicitly links X86CodeGen against the Instrumentation
component.

Differential Revision: https://reviews.llvm.org/D108662
2021-08-24 14:08:17 -07:00
Jessica Paquette ef8707574b [AArch64][GlobalISel] Legalize narrow scalar FP arithmetic
Widen narrow fp arithmetic ops (e.g. G_FADD). When we don't have full FP16
support, widen to s32. Otherwise widen to s16.

https://godbolt.org/z/TbT9Pqa7e

Differential Revision: https://reviews.llvm.org/D108660
2021-08-24 13:54:28 -07:00
Nico Weber 67ffce68bc Make WindowsManifestMerger::merge() take a MemoryBufferRef
No behavior change.
2021-08-24 16:39:20 -04:00
Patrick Holland e4ebfb5786 [MCA] Adding an AMDGPUCustomBehaviour implementation.
This implementation allows mca to model the desired behaviour of the s_waitcnt
instruction. This patch also adds the RetireOOO flag to the AMDGPU instructions
within the scheduling model. This flag is only used by mca and allows
instructions to finish out-of-order which helps mca's simulations more closely
model the actual device.

Differential Revision: https://reviews.llvm.org/D104730
2021-08-24 13:33:58 -07:00
Kirill Stoimenov b97ca3aca1 Revert "[asan] Implemented intrinsic for the custom calling convention similar used by HWASan for X86."
This reverts commit 9588b685c6. Breaks a bunch of builds.

Reviewed By: GMNGeoffrey

Differential Revision: https://reviews.llvm.org/D108658
2021-08-24 13:21:20 -07:00
Sanjay Patel 204038d52e [InstSimplify] fold or+shifted -1 to -1
These are similar to the rotate pattern added with:
dcf659e821
...but we don't have guard ops on the shift amount,
so we don't canonicalize to the intrinsic.

  declare void @llvm.assume(i1)

  define i32 @src(i32 %shamt, i32 %bitwidth) {
    ; subtract must be in range of bitwidth
    %lt = icmp ule i32 %bitwidth, 32
    call void @llvm.assume(i1 %lt)

    %r = lshr i32 -1, %shamt
    %s = sub i32 %bitwidth, %shamt
    %l = shl i32 -1, %s
    %o = or i32 %r, %l
    ret i32 %o
  }

  define i32 @tgt(i32 %shamt, i32 %bitwidth) {
    ret i32 -1
  }

https://alive2.llvm.org/ce/z/aF7WHx
2021-08-24 15:38:38 -04:00
Kirill Stoimenov 9588b685c6 [asan] Implemented intrinsic for the custom calling convention similar used by HWASan for X86.
The implementation uses the int_asan_check_memaccess intrinsic to instrument the code. The intrinsic is replaced by a call to a function which performs the access check. The generated function names encode the input register name as a number using Reg - X86::NoRegister formula.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D107850
2021-08-24 19:34:34 +00:00
Thomas Johnson ce1dc9d647 [ARC] Add codegen for the readcyclecounter intrinsic along with disassembly for associated instructions
Differential Revision: https://reviews.llvm.org/D108598
2021-08-24 11:53:20 -07:00
Stanislav Mekhanoshin 92c1fd19ab Allow rematerialization of virtual reg uses
Currently isReallyTriviallyReMaterializableGeneric() implementation
prevents rematerialization on any virtual register use on the grounds
that is not a trivial rematerialization and that we do not want to
extend liveranges.

It appears that LRE logic does not attempt to extend a liverange of
a source register for rematerialization so that is not an issue.
That is checked in the LiveRangeEdit::allUsesAvailableAt().

The only non-trivial aspect of it is accounting for tied-defs which
normally represent a read-modify-write operation and not rematerializable.

The test for a tied-def situation already exists in the
/CodeGen/AMDGPU/remat-vop.mir,
test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve.

The change has affected ARM/Thumb, Mips, RISCV, and x86. For the targets
where I more or less understand the asm it seems to reduce spilling
(as expected) or be neutral. However, it needs a review by all targets'
specialists.

Differential Revision: https://reviews.llvm.org/D106408
2021-08-24 11:09:02 -07:00
Jessica Paquette db232de193 [AArch64][GlobalISel] Legalize + select v2p0 -> v264 G_PTRTOINT
1) Just mark this case as legal because it can just be a copy.

2) Ensure the copy in the existing code actually gets selected. Without doing
this, we'll crash because the destination won't have a register class.

This fell back 35 times in a build of clang with GISel for AArch64.

Differential Revision: https://reviews.llvm.org/D108610
2021-08-24 11:02:01 -07:00
Arthur Eubanks b109becdce [NFC] Add and use AttributeList::removeFnAttributes() 2021-08-24 09:53:39 -07:00
Rong Xu de620f5b13 [CSPGO] Fix lost IRPGOFlag in CSPGO instrumentation
The IRPGOFlag symbol (__llvm_profile_raw_version) is dropped when
identified as non-prevailing for either regular or thin LTO during
the mixed-LTO mode compilation. This happens in the module where
IRPGOFlag is marked as non-prevailing. This variable
is emitted in the final object from the prevailing module.

This is still problematic because we currently query this symbol
to coordinate some actions between PGOInstrumentation pass
and InstrProfiling lowering pass, like whether to do value
profiling, whether to do comdat renaming.

This problem is bought up by YolandaCY in
https://reviews.llvm.org/D107034
YolandCY reported unresolved symbol linker errors in
CSPGO instrumentation build for chromium.

This patch let LTO retain IRPGOFlag decl by adding it to
CompilerUsed list and relax the check in isIRPGOFlagSet() when
doing the InstrProfiling lowering.

The test case in the patch is from D107034
<https://reviews.llvm.org/D107034>.

Differential Revision: https://reviews.llvm.org/D108581
2021-08-24 09:41:29 -07:00
Philip Reames 58582bae63 Revert "[SCEV] Infer nsw/nuw from nw for addrecs"
This reverts commit 914836b1c8.  Further comments on review came up after initial approval.  Reverting while addressing.
2021-08-24 09:28:37 -07:00
Jessica Paquette 67d4dd5c07 [AArch64][GlobalISel] Select @llvm.aarch64.neon.ld4.*
Reuse the selection code from the ld2 case. This is similar to how SDAG handles
things in AArch64ISelDAGToDAG. (See SelectLoad)

This fell back ~100 times while building clang with GISel enabled for AArch64.

Factoring out the gross subreg copy part ought to make selecting the rest of
this family fairly easy.

Differential Revision: https://reviews.llvm.org/D108600
2021-08-24 09:03:49 -07:00
Philip Reames 1e07f19bfc Revert "Special case common branch patterns in breakLoopBackedge"
This reverts commit aec08e8600.

Several problems have been reported with malformed loopinfo after this change, see discussion on https://reviews.llvm.org/rGaec08e86004b.
2021-08-24 08:53:42 -07:00
Philip Reames 914836b1c8 [SCEV] Infer nsw/nuw from nw for addrecs
If we no an addrec doesn't self-wrap, the increment is strictly positive, and the start value is the smallest representable value, then we know that the corresponding wrap type can not occur.

Differential Revision: https://reviews.llvm.org/D108601
2021-08-24 08:53:21 -07:00
Simon Pilgrim 307890f85b [X86] Freeze vXi8 shl(x,1) -> add(x,x) vector fold (PR50468)
We don't have any vXi8 shift instructions (other than on XOP which is handled separately), so replace the shl(x,1) -> add(x,x) fold with shl(x,1) -> add(freeze(x),freeze(x)) to avoid the undef issues identified in PR50468.

Split off from D106675 as I'm still looking at whether we can fix the vXi16/i32/i64 issues with the D106679 alternative.

Differential Revision: https://reviews.llvm.org/D108139
2021-08-24 16:08:24 +01:00
Simon Pilgrim 194b08000c [DAG] LoadedSlice::canMergeExpensiveCrossRegisterBankCopy - replace getABITypeAlign with allowsMemoryAccess (PR45116)
One of the cases identified in PR45116 - we don't need to limit load combines to ABI alignment, we can use allowsMemoryAccess - which tests using getABITypeAlign, but also checks if a target permits (fast) misaligned memory loads by checking allowsMisalignedMemoryAccesses as a fallback.
2021-08-24 15:28:30 +01:00
Simon Pilgrim 6de0b55188 [DAG] TransformFPLoadStorePair - replace getABITypeAlign with allowsMemoryAccess (PR45116)
One of the cases identified in PR45116 - we don't need to limit load combines (in this case for fp->int load/store copies) to ABI alignment, we can use allowsMemoryAccess - which tests using getABITypeAlign, but also checks if a target permits (fast) misaligned memory loads by checking allowsMisalignedMemoryAccesses as a fallback.

Differential Revision: https://reviews.llvm.org/D108318
2021-08-24 13:11:27 +01:00
Simon Pilgrim e431b280c9 [DAG] CombineConsecutiveLoads - replace getABITypeAlign with allowsMemoryAccess (PR45116)
One of the cases identified in PR45116 - we don't need to limit load combines (in this case for ISD::BUILD_PAIR) to ABI alignment, we can use allowsMemoryAccess - which tests using getABITypeAlign, but also checks if a target permits (fast) misaligned memory loads by checking allowsMisalignedMemoryAccesses as a fallback.

This helps in particular for 32-bit X86 cases loading 64-bit size data, reducing codegen diffs vs x86_64.

Differential Revision: https://reviews.llvm.org/D108307
2021-08-24 12:31:22 +01:00
Ricky Taylor 47f52f989b [M68k][AsmParser] Support parsing register masks & fix printing them
Fixes PR51580.

Register masks will now be printed as 'movem.l (%sp), %a0-%a5/%d5'
for example and can now be parsed in the same format.

Previously the printed syntax was 'movem.l (%sp), %a0-%a5,%d', which
didn't match prior art and was too ambiguous to easily parse.

Differential Revision: https://reviews.llvm.org/D108597
2021-08-24 10:40:02 +01:00
Jeremy Morse 992e21eeee [DebugInfo][InstrRef] Fix over-droppage of locations in X86FloatingPoint
Over in D105657, we started dropping instruction numbers (that become
variable locations) from call instructions, as we can't correctly represent
the x87 FP stack. Unfortunately, it turns out that the "special FP
instructions" that this pass transforms includes "every call instruction"
[0]. Thus, we've ended up dropping all return values from all calls. Ouch.

This patch adds a filter: only drop instruction numbers from calls if they
return something on the FP stack. Seeing how LLVM only allows a single
return value, this should drop instruction numbers on anything that returns
a float, and nothing else.

Rather than writing a new test, I've modified the original one to have a
positive and negative case: drop instruction number on a call with an
FP-stack modification, keep it on a plain call.

Differential Revision: https://reviews.llvm.org/D108580
2021-08-24 10:24:07 +01:00
Jingu Kang b52171629f [GVN] Execute performLoopLoadPRE ahead of PerformLoadPRE
Differential Revision: https://reviews.llvm.org/D108204
2021-08-24 09:50:27 +01:00
Cullen Rhodes e9c8973f1c [AArch64][SME] Fix v8.6a bf16 NEON instruction predication
In streaming mode on SME targets only the scalar BFCVT armv8.6-a
instruction is legal, predicate the illegal instructions on NEON to
disable them in streaming mode (see D107902). BFCVT is predicated on
HasNEONorStreamingSVE.

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06/SIMD-FP-Instructions

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D108279
2021-08-24 08:13:57 +00:00
Martin Storsjö 039b469b85 [ARM] Allow using ';' as asm statement separator in MSVC mode
This does the same as D96259, but for ARM, just like AArch64,
using the same comment char as for ELF and MinGW mode.

As the assembly input/output of LLVM is GAS style, trying to
match what MS armasm.exe does isn't needed (because the comment
char used is the least concern when it comes to that; all
directives differ too). If a separate armasm compatible mode
is implemented, it can use its own comment style (just like
llvm-ml implements MS ml.exe compatible assembly parsing).

This fixes building compiler-rt assembly files for ARM in MSVC
mode.

The updated testcase literals-comments.s was only intended to
make sure that '#' isn't interpreted as a comment char.

Differential Revision: https://reviews.llvm.org/D107251
2021-08-24 11:01:49 +03:00
Anton Afanasyev bed587631f [AggressiveInstCombine] Add arithmetic shift right instr to `TruncInstCombine` DAG
Add `ashr` instruction to the DAG post-dominated by `trunc`, allowing
`TruncInstCombine` to reduce bitwidth of expressions containing
these instructions.

We should be shifting by less than the target bitwidth.
Also it is sufficient to require that all truncated bits
of the value-to-be-shifted are sign bits (all zeros or ones) and
one sign bit is left untruncated: https://alive2.llvm.org/ce/z/Ajo2__

Part of https://reviews.llvm.org/D107766

Differential Revision: https://reviews.llvm.org/D108355
2021-08-24 10:41:16 +03:00
Liu, Chen3 b7795eb646 [X86] Building constant vector which element type is half will cause assertion fail.
Fix assertion fail when building con constant vector which element type is half.

Differential Revision: https://reviews.llvm.org/D108612
2021-08-24 14:34:30 +08:00
Wang, Pengfei c728bd5bba [X86] AVX512FP16 instructions enabling 5/6
Enable FP16 FMA instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105268
2021-08-24 09:07:19 +08:00
Philip Reames 96ef794fd0 [SCEV] Add a hasFlags utility to improve readability [NFC] 2021-08-23 17:36:52 -07:00
Jessica Paquette 2ec2b25fba [AArch64][GlobalISel] Select @llvm.aarch64.neon.ld2.*
This is pretty similar to the ST2 selection code in
`AArch64InstructionSelector::selectIntrinsicWithSideEffects`.

This is a GISel equivalent of the ld2 case in `AArch64DAGToDAGISel::Select`.
There's some weirdness there that appears here too (e.g. using ld1 for scalar
cases, which are 1-element vectors in SDAG.)

It's a little gross that we have to create the copy and then select it right
after, but I think we'd need to refactor the existing copy selection code
quite a bit to do better.

This was falling back while building llvm-project with GISel for AArch64.

Differential Revision: https://reviews.llvm.org/D108590
2021-08-23 17:15:53 -07:00
Greg Clayton a58c2e4af0 Fix DWARFDie::getDeclFile(...) to work with DW_AT_specification.
DWARFDie::getDeclFile(...) previously only supported getting the DW_AT_decl_file if the DIE itself contained the DW_AT_decl_file attribute, or if the DIE had a DW_AT_abstract_origin that pointed to another DIE that had a DW_AT_decl_file. This patch allows the function to get the right attribute value if there is a DW_AT_specification that points to another DIE. We also test that if a DW_AT_abtract_origin or DW_AT_specification points to a DIE in another CU with a DW_FORM_ref_addr, that the right line table is used to extract the file index.

Full tests were added for the following cases:
- DIE has a DW_AT_decl_file attribute
- DIE has a DW_AT_abtract_origin that points to another die in the same CU
- DIE has a DW_AT_abtract_origin that points to another die in another CU
- DIE has a DW_AT_specification that points to another die in the same CU
- DIE has a DW_AT_specification that points to another die in another CU

Differential Revision: https://reviews.llvm.org/D108480
2021-08-23 15:43:18 -07:00
Mircea Trofin 1055c5e1d3 [MLGO] Make sure inliner logs when deleting callees
When using final reward (which is now the default), we were skipping
logging decisions that were leading to callee deletion. This fixes that.

Differential Revision: https://reviews.llvm.org/D108587
2021-08-23 14:54:46 -07:00
Azharuddin Mohammed d898693f72 [ExecutionEngine] Use the libunwind __register_frame on Darwin
This was already the case, but the recent change (957334382c) altered
the behavior on some of our bots where __unw_add_dynamic_fde is not
found. This restores the prior behavior on Darwin while also retaining
the new behavior from that change.
2021-08-23 14:51:14 -07:00
Sanjay Patel cc9c545fb4 [InstCombine] generalize subtract with 'not' operands; 2nd try
This is a re-try of 3aa009cc87 which was reverted at
9577fac0fd because it caused an infinite loop.

For the extra test case, either re-ordering the transforms
or adding the extra clause to avoid sub-of-sub is enough
to prevent the infinite compile, but I'm doing both to be
safer.

Original commit message:
The motivation was to get min/max intrinsics to parity
with cmp+select idioms, but this unlocks a few more
folds because isFreeToInvert recognizes add/sub with
constants too.

In the min/max example, we have too many extra uses
for smaller folds to improve things, but this fold
is able to eliminate uses even though we can't reduce
the number of instructions.
2021-08-23 17:06:51 -04:00
Fangrui Song ba6e15d8cc [TargetMachine] Move COFF special case for ExternalSymbolSDNode from shouldAssumeDSOLocal to X86Subtarget
Intended to be NFC. ARM/AArch64 don't appear to need adjustment.

TargetMachine::shouldAssumeDSOLocal is expected to be very simple, ideally
matching isDSOLocal(). The IR producers are expected to set dso_local correctly.
(While some may think this function can make producers' work easier, the
function is really not in a good position to set dso_local. See the various
special cases we duplicate from clang CodeGenModule.cpp.)

Reviewed By: mstorsjo

Differential Revision: https://reviews.llvm.org/D108514
2021-08-23 13:54:40 -07:00
Artem Belevich 49d982d8cb [CUDA] Add support for CUDA-11.4
Differential Revision: https://reviews.llvm.org/D108239
2021-08-23 13:24:46 -07:00
Simon Pilgrim 10c982e0b3 Revert rG1c9bec727ab5c53fa060560dc8d346a911142170 : [InstCombine] Fold (gep (oneuse(gep Ptr, Idx0)), Idx1) -> (gep Ptr, (add Idx0, Idx1)) (PR51069)
Reverted (manually due to merge conflicts) while regressions reported on PR51540 are investigated

As noticed on D106352, after we've folded "(select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0))" if the inner Ptr was also a (now one use) gep we could then merge the geps, using the sum of the indices instead.

I've limited this to basic 2-op geps - a more general case further down InstCombinerImpl.visitGetElementPtrInst doesn't have the one-use limitation but only creates the add if it can be created via SimplifyAddInst.

https://alive2.llvm.org/ce/z/f8pLfD (Thanks Roman!)

Differential Revision: https://reviews.llvm.org/D106450
2021-08-23 21:09:26 +01:00
David Green 50f4ae58eb [AArch64] Correct store ReadAdrBase operand
It appears that the Read operand for stores was being placed on the
first operand (the stored value) not the address base. This adds a
ReadST for the stored value operand, allowing the ReadAdrBase to
correctly act upon the address.

Differential Revision: https://reviews.llvm.org/D108287
2021-08-23 21:07:55 +01:00
Nikita Popov 19dc02e99f [MergeICmps] Allow sinking past non-load/store
This is a followup to D106591. MergeICmps currently only allows
sinking the loads past either instructions that don't write to
memory at all, or simple loads/stores that don't modify the memory
the loads access.

The "simple loads/stores" part of this check doesn't seem necessary
to me -- AA isModRef() already accurately models any operation
that may clobber the memory. For example, in the adjusted test case
the transform is still fine if the call to @foo() isn't readonly,
but inaccessiblememonly -- in both cases, the call cannot modify
the loaded memory.

Differential Revision: https://reviews.llvm.org/D108517
2021-08-23 22:03:49 +02:00
Alina Sbirlea e8723abf43 [DSE] Check post-dominance for malloc+memset->calloc transform.
Aiming to address the regression discussed in
https://reviews.llvm.org/D103009.

Differential Revision: https://reviews.llvm.org/D108485
2021-08-23 12:39:51 -07:00
Stanislav Mekhanoshin 401a45c61b Fix late rematerialization operands check
D106408 enables rematerialization of instructions with virtual
register uses. That has uncovered the bug in the allUsesAvailableAt
implementation: https://bugs.llvm.org/show_bug.cgi?id=51516.

In the majority of cases canRematerializeAt() called to check if
an instruction can be rematerialized before the given UseIdx.
However, SplitEditor::enterIntvAtEnd() calls it to rematerialize
an instruction at the end of a block passing LIS.getMBBEndIdx()
into the check. In the testcase from the bug it has attempted to
rematerialize ADDXri after STRXui in bb.17. The use operand %55
of the ADD is killed by the STRX but that is undetected by the check
because it adjusts passed UseIdx to the reg slot, before the kill.
The value is dead at the index passed to the check however.

This change uses a later of passed UseIdx and its reg slot. This
shall be correct because if are checking an availability of operands
before an instruction that instruction cannot be the one defining
these operands. If we are checking for late rematerialization we
are really interested if operands live past the instruction.

The bug is not exploitable without D106408 but needed to reland
reverted D106408.

Differential Revision: https://reviews.llvm.org/D108475
2021-08-23 12:23:58 -07:00
Zarko Todorovski b575bbd0c7 [PowerPC][AIX] Set the HasAlloca flag in the AIX Traceback Table only if R31 is used as a frame pointer
After c063946476 usage of R31 doesn't necessarily mean
that alloca is used. The `TracebackTable::IsAllocaUsedMask` flag should be set only
when R31 is used as a frame pointer.

On AIX the `function calls alloca' bit seems to be set whenever R31 is
set up as a frame pointer, even when there is no alloca call.

Reviewed By: lkail

Differential Revision: https://reviews.llvm.org/D108141
2021-08-23 15:20:41 -04:00
Greg Clayton e100a41bbe Fix fallback code that gets decl file + line.
When a function has no line table, but does have debug info (DW_TAG_subprogram), we fall back to creating a line table with a single line entry that has the start address of the function and the source file and line of the function declaration. The bug in this code was that we might have a DW_TAG_subprogram that uses a DW_AT_specification or DW_AT_abstract_origin that points to another DIE, and that DIE might be in another compile unit. The bug was we were grabbing the file index value from the DIE, and that index could be from the other DIE in another compile unit that has its own and compleltely different file table, so we might be using a file index from one compile unit with the file table from another. This was causing a crash in llvm-gsymuil when run against dSYM files. dsymutil, the Apple DWARF linker, will often unique types and can end up with more absolute references across different compile units.

The fix is to use the DWARFDie::getDeclFile(...) accessor as it does fetch this information correctly.

Differential Revision: https://reviews.llvm.org/D108497
2021-08-23 11:06:15 -07:00
Jessica Paquette a2c8e17658 [AArch64][GlobalISel] Add regbankselect support for G_LLROUND
Same as G_LROUND: destination should always be a GPR, source should always be
a FPR.

Differential Revision: https://reviews.llvm.org/D108566
2021-08-23 10:32:20 -07:00
Jessica Paquette fe51f9098b [AArch64][GlobalISel] Legalize G_LLROUND for s64 + s32
Same as G_LROUND.

Also add a TODO for full fp16 legalization.

Differential Revision: https://reviews.llvm.org/D108564
2021-08-23 09:45:23 -07:00
Jessica Paquette 6760e2a7bc [GlobalISel] Translate @llvm.llround.* -> G_LLROUND
Translate it using `IRTranslator::translateSimpleIntrinsic`.

Differential Revision: https://reviews.llvm.org/D108563
2021-08-23 09:42:53 -07:00
Florian Hahn 9577fac0fd
Revert "[InstCombine] generalize subtract with 'not' operands"
This reverts commit 3aa009cc87.

The reverted commit causes an infinite loop in instcombine. See PR51584.
2021-08-23 15:47:21 +01:00
Chuanqi Xu 2556f58148 [FuncSpec] Don't specialize function which are easy to inline
It would waste time to specialize a function which would inline finally.
This patch did two things:

- Don't specialize functions which are always-inline.
- Don't spescialize functions whose lines of code are less than threshold
(100 by default).

For spec2017int, this patch could reduce the number of specialized
functions by 33%. Then the compile time didn't increase for every
benchmark.

Reviewed By: SjoerdMeijer, xbolva00, snehasish

Differential Revision: https://reviews.llvm.org/D107897
2021-08-23 19:20:21 +08:00
Alexander Potapenko 8300d52e8c [tsan] Add support for disable_sanitizer_instrumentation attribute
Unlike __attribute__((no_sanitize("thread"))), this one will cause TSan
to skip the entire function during instrumentation.

Depends on https://reviews.llvm.org/D108029

Differential Revision: https://reviews.llvm.org/D108202
2021-08-23 12:38:33 +02:00
Florian Hahn d024a01511
Recommit "[LoopVectorize][AArch64] Enable ordered reductions by default for AArch64"
This reverts the revert ab9296f13b.

The issue causing the revert should be fixed in 9baed023b4.
2021-08-23 11:25:27 +01:00
Jay Foad 7a967d9011 [AMDGPU] Try to fix a GCC 11 warning
Apparently GCC 11 was warning:
AMDGPURegisterBankInfo.cpp:2543:33: warning: enumerated and non-enumerated type in conditional expression [-Wextra]
2021-08-23 10:51:37 +01:00
Cullen Rhodes fb82b836b7 [AArch64][SME] Support NEON scalar FP instructions in streaming mode
The following scalar FP instructions are legal in streaming mode:

  0101 1110 xx1x xxxx 11x1 11xx xxxx xxxx # FMULX/FRECPS/FRSQRTS (scalar)
  0101 1110 x10x xxxx 00x1 11xx xxxx xxxx # FMULX/FRECPS/FRSQRTS (scalar, FP16)
  01x1 1110 1x10 0001 11x1 10xx xxxx xxxx # FRECPE/FRSQRTE/FRECPX (scalar)
  01x1 1110 1111 1001 11x1 10xx xxxx xxxx # FRECPE/FRSQRTE/FRECPX (scalar, FP16)

Predicate them on `HasNEONorStreamingSVE`. Full list of affected
instructions:

  FMULX16, FMULX32, FMULX64, FRECPS16, FRECPS32, FRECPS64, FRSQRTS16,
  FRSQRTS32, FRSQRTS64, FRECPEv1f16, FRECPEv1i32, FRECPEv1i64, FRECPXv1f16,
  FRECPXv1i32, FRECPXv1i64, FRSQRTEv1f16, FRSQRTEv1i32, FRSQRTEv1i64

Depends on D107902.

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06/SIMD-FP-Instructions

Execution of NEON instructions that are illegal in streaming mode will
cause a trap or exception. Using FMULX [1] as an example, this check is
at the top of the pseudocode:

  if elements == 1 then
      CheckFPEnabled64();
  else
      CheckFPAdvSIMDEnabled64();

For the legal scalar variants it calls `CheckFPEnabled64`, whereas for the
illegal vector variants it calls `CheckFPAdvSIMDEnabled64` which traps.

This is useful for observing which instructions are/aren't legal
in streaming mode.

[1] https://developer.arm.com/documentation/ddi0602/2021-06/SIMD-FP-Instructions/FMULX--Floating-point-Multiply-extended-

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D108039
2021-08-23 08:48:34 +00:00
Cullen Rhodes cf3c6cca9f [AArch64][SME] Add predicate for NEON support in streaming mode
Split out from D107903 to remove dependency for D108039 and D108279.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D108293
2021-08-23 08:48:33 +00:00
Kai Luo 7165e6713f [PowerPC] Use int64_t to represent stack object offset and frame size
This is the first step to enable PPC64 support huge frame size(>2G). Also fix an assertion error for frame size, i.e.,`int x; !isInt<32>(x);` should be always evaluated false, so the guard code for frame size is impossible to hit.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D107435
2021-08-23 02:13:21 +00:00
Nikita Popov 2b70b68efb [GVN] Don't short-circuit load PRE
4ad41902e8 changed this code to
propagate Changed if scalar GEP PRE is performed. However, as
implemented this would skip the load PRE entirely if GEP indices
were PREd. Make sure load PRE runs even if Changed is already
true.

This likely has no functional effect as load PRE would then
occur on a later GVN iteration.
2021-08-22 21:12:58 +02:00
Philip Reames d8d84c9df8 [runtimeunroll] Use early return to reduce nesting [nfc] 2021-08-22 11:34:50 -07:00
Philip Reames aec08e8600 Special case common branch patterns in breakLoopBackedge
This special cases an unconditional latch and a conditional branch latch exit to improve codegen and test readability.  I am hoping to reuse this function in the runtime unroll code, but without this change, the test diffs are far too complex to assess.
2021-08-22 10:42:23 -07:00
Simon Pilgrim 805fb1f6c1 [X86] combineMul - move MUL_IMM comment inside function. NFC.
combineMul is now used for other things as well as the mul-with-constant expansion - move the comment to where its actually relevant.
2021-08-22 18:27:03 +01:00
Alexey Lapshin 07d44cc0b1 [DWARF][Verifier] Do not add child DieRangeInfo with empty address range to the parent.
verifyDieRanges function checks for the intersected address ranges.
It adds child DieRangeInfo into parent DieRangeInfo to check
whether children have overlapping address ranges. It is safe to not add
DieRangeInfo with empty address range into parent's children list.
This decreases the number of children which should be navigated and as a result
decreases execution time(parents having a lot of children with empty ranges
spend much time navigating them). For this command: "llvm-dwarfdump --verify clang-repl"
execution time decreased from 220 sec till 75 sec.

Differential Revision: https://reviews.llvm.org/D107554
2021-08-22 19:39:21 +03:00
Nikita Popov fafe5a6f44 [InstCombine] Perform "eq of parts" fold with logical ops
The pattern matched here is too complex for the general logical
and/or to bitwise and/or conversion to trigger. However, the
fold is poison-safe, so match it with a select root as well:

https://alive2.llvm.org/ce/z/vNzzSg
https://alive2.llvm.org/ce/z/Beyumt
2021-08-22 16:55:53 +02:00
Simon Pilgrim 352df10a23 [X86][AVX] matchShuffleAsBlend - use isElementEquivalent to help match broadcast/repeated elements
Extend matchShuffleAsBlend to not only match against known in-place elements for BLEND shuffles, but use isElementEquivalent to determine if the shuffle mask's referenced element is the same as the in-place element.

This allows us to replace a number of insertps instructions with more general blendps instructions (better opportunities for commutation, concatenation etc.).
2021-08-22 15:26:17 +01:00
Simon Pilgrim 96fb3eef66 Fix signed/unsigned comparison warning. NFCI. 2021-08-22 15:02:19 +01:00
Simon Pilgrim a1c892b439 [X86][SSE] lowerVECTOR_SHUFFLE - canonicalize with horizontal ops.
Before lowering shuffles, see if we can merge horizontal ops or canonicalize the shuffle mask to point to the same LHS/RHS of the HOps when an HOp's args are repeated.
2021-08-22 14:54:48 +01:00
Sanjay Patel dcf659e821 [InstSimplify] fold rotate of -1 to -1
This is part of solving more general rotate patterns seen in
bugs related to:
https://llvm.org/PR51575

https://alive2.llvm.org/ce/z/GpkFCt
2021-08-22 09:15:48 -04:00
Sanjay Patel d41e308f10 [InstSimplify] fold rotate of zero to zero
This is part of solving more general rotate patterns seen in
bugs related to:
https://llvm.org/PR51575

https://alive2.llvm.org/ce/z/fjKwqv
2021-08-22 09:15:48 -04:00
Simon Pilgrim 8533e782ef [X86] Try to sync HSW + BDW model class defs to simplify comparisons. NFC.
Broadwell is mainly a die shrink of Haswell, but the model had many of the scheduling classes in different orders, making side-by-side comparisons very difficult.

The InstRW overrides are still quite different, but at least that part of the side-by-side diff is now in the same position.

This was noticed while I was trying to investigate diffs between llvm-mca and other perf analyzers in https://uica.uops.info/ - we used to be able to do diffs between most of the models very easily, but we seem to have lost that simplicity as classes have been altered, models have been refined and other models have rotted.
2021-08-22 13:02:51 +01:00
Sanjay Patel 3aa009cc87 [InstCombine] generalize subtract with 'not' operands
The motivation was to get min/max intrinsics to parity
with cmp+select idioms, but this unlocks a few more
folds because isFreeToInvert recognizes add/sub with
constants too.

In the min/max example, we have too many extra uses
for smaller folds to improve things, but this fold
is able to eliminate uses even though we can't reduce
the number of instructions.
2021-08-22 07:18:31 -04:00
Florian Hahn 9baed023b4
[LV] Adjust reduction recipes before recurrence handling.
Adjusting the reduction recipes still relies on references to the
original IR, which can become outdated by the first-order recurrence
handling. Until reduction recipe construction does not require IR
references, move it before first-order recurrence handling, to prevent a
crash as exposed by D106653.
2021-08-22 11:02:33 +01:00
Ben Shi f69fb7ac72 [DAGCombiner] Add target hook function to decide folding (mul (add x, c1), c2)
Reviewed by: lebedev.ri, spatel, craig.topper, luismarques, jrtc27

Differential Revision: https://reviews.llvm.org/D107711
2021-08-22 16:53:32 +08:00
luxufan dda116bc3d [JITLink] Add support of R_X86_64_32S relocation
This patch supported the R_X86_64_32S relocation and add the Pointer32Signed generic edge kind.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D108446
2021-08-22 16:45:25 +08:00
Wang, Pengfei b088536ce9 [X86] AVX512FP16 instructions enabling 4/6
Enable FP16 unary operator instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105267
2021-08-22 08:59:35 +08:00
Fangrui Song 1dfb30e54c [TargetCallingConv] Change OutputArg ctor to match its members
This avoids unneeded MVT->EVT conversion.
2021-08-21 16:41:48 -07:00
Fangrui Song 0473e9f41a [AArch64] Replace unneeded CCAssignToRegWithShadow with CCAssignToReg
CCState::AllocateReg handles aliased registers.
2021-08-21 16:33:29 -07:00
Fangrui Song a83d99c55e [TargetMachine] Drop special case for *-win32-macho
clang CodeGenModule shouldAssumeDSOLocal has set dso_local.
2021-08-21 13:59:17 -07:00
Fangrui Song c5ee312368 [TargetMachine] Simplify shouldAssumeDSOLocal. NFC 2021-08-21 12:37:29 -07:00
Sanjay Patel 41af8f0ad5 [InstCombine] combine constants by reassociating add/sub/add
This may overlap partially with the reassociate pass,
but it seems simple enough that we should try it here
in InstCombine to enable other folds.

This shows up as an opportunity and potential regression
if we improve a subtract fold with 'not' ops to be more
general.
2021-08-21 11:45:43 -04:00
David Green 605489d593 [ARM] Fix VQDMULH fold for scalar smin
Add a variant of mve-vqdmulh tests that uses min/max intrinsics
directly, including a scalar test that shows it misbehaving for min
intrinsics and a fix for the combine to prevent it from misbehaving.
2021-08-21 16:33:18 +01:00
Lang Hames 7f99337f9b [ORC] Add EPCGenericMemoryAccess: generic executor memory access via EPC calls.
All ExecutorProcessControl subclasses must provide an
ExecutorProcessControl::MemoryAccess object that can be used to access executor
memory from the JIT process. The EPCGenericMemoryAccess class provides an
off-the-shelf MemoryAccess implementation for JITs that do not need (or cannot
provide) a specialized MemoryAccess implementation. This simplifies the process
of creating new ExecutorProcessControl implementations.
2021-08-21 19:33:39 +10:00
eopXD 4fc98ca617 [NFC][LoopIdiom] Let processLoopStoreOfLoopLoad take StoreSize as SCEV instead of unsigned
Letting it take SCEV allows further modification on the function to optimize
if the StoreSize / Stride is runtime determined.

The plan is to let memcpy / memmove deal with runtime-determined sizes, just
like what D107353 did to memset.

Reviewed By: bmahjour

Differential Revision: https://reviews.llvm.org/D108289
2021-08-21 00:03:28 -07:00
Amara Emerson 3187a4f3f1 [AArch64][GlobalISel] Add legalizer support for the @llvm.get.dynamic.area.offset intrinsic.
This is just 0 on AArch64.
2021-08-20 17:13:34 -07:00
Amara Emerson 67bf3ac744 [AArch64][GlobalISel] Don't contract cross-bank copies into truncating stores.
Truncating stores with GPR bank sources shouldn't be mutated into using FPR bank
sources, since those aren't supported.

Ideally this should be a selection failure in the tablegen patterns, but for now
avoid generating them.
2021-08-20 16:36:23 -07:00
Jessica Paquette 9e9d70591e [AArch64][GlobalISel] Legalize non-register-sized scalar G_BITREVERSE
Clamp types to [s32, s64] and make them a power of 2.

This matches SDAG's behaviour.

https://godbolt.org/z/vTeGqf4vT

Differential Revision: https://reviews.llvm.org/D108344
2021-08-20 14:44:03 -07:00
Jessica Paquette 7e91c59844 [AArch64][GlobalISel] Legalize 32-bit + narrow G_SMULO + G_UMULO
SDAG lowers 32-bit and 64-bit G_SMULO + G_UMULO. We were missing the 32-bit
case.

For other sizes, make the 0th type a power of 2 and clamp it to either 32 bits
or 64 bits.

Right now, this will allow us to handle narrow types (e.g. s4, s24, etc.). The
LegalizerHelper doesn't support narrowing G_SMULO or G_UMULO right now. I think
we want clamping behaviour either way, so we might as well include it now to
be explicit.

Differential Revision: https://reviews.llvm.org/D108240
2021-08-20 14:37:46 -07:00
Jessica Paquette 16caf6321c [AArch64][GlobalISel] Clamp vectors of p0 when legalizing G_LOAD/G_STORE
We had a rule for <n x s64> but not one for <n x p0>. As a result, we'd fall
back on like <5 x p0> or whatever.

Differential Revision: https://reviews.llvm.org/D108484
2021-08-20 14:34:49 -07:00
Jessica Paquette 470c74f181 [AArch64][GlobalISel] Add regbankselect support for G_LROUND
Destination is always a GPR, since the result is always an integer.

Source is always a FPR, since the source is always floating point.

Differential Revision: https://reviews.llvm.org/D108419
2021-08-20 14:31:14 -07:00
Jessica Paquette 44bf0dc625 [AArch64][GlobalISel] Mark G_LROUND as legal for s64 dst + s32/s64 src.
Matches SDAG's behaviour for these types.

Differential Revision: https://reviews.llvm.org/D108420
2021-08-20 14:22:58 -07:00
Jessica Paquette af8e09d4bb [GlobalISel] Add G_LLROUND
Basically the same as G_LROUND. Handles the llvm.llround family of intrinsics.

Also add a helper function to the MachineVerifier for checking if all of the
(virtual register) operands of an instruction are scalars. Seems like a useful
thing to have.

Differential Revision: https://reviews.llvm.org/D108429
2021-08-20 14:07:21 -07:00
Nikita Popov 0afd10b403 [LoopPassManager] Assert that MemorySSA is preserved if used
Currently it's possible to silently use a loop pass that does not
preserve MemorySSA in a loop-mssa pass manager, as we don't
statically know which loop passes preserve MemorySSA (as was the
case with the legacy pass manager).

However, we can at least add a check after the fact that if
MemorySSA is used, then it should also have been preserved.
Hopefully this will reduce confusion as seen in
https://bugs.llvm.org/show_bug.cgi?id=51020.

Differential Revision: https://reviews.llvm.org/D108399
2021-08-20 22:48:04 +02:00
Mircea Trofin 8dc3fe0cd1 [NFC][MLGO] Use std::move when moving protobufs
Because of an odd linking problem, we need to temporarily support
building with TF C API 1.15 + tensorflow 2.50 pip package in
'development' mode scenarios. Protobuf Message 'Swap' is partially
implemented in the header (2.50) and relies on a symbol not found in TF
C API 1.15. std::move avoids that, at no semantic cost.
2021-08-20 13:40:35 -07:00
Florian Hahn ab9296f13b
Revert "[LoopVectorize][AArch64] Enable ordered reductions by default for AArch64"
This reverts commit f4122398e7 to
investigate a crash exposed by it.

The patch breaks building the code below with `clang -O2 --target=aarch64-linux`

     int a;
     double b, c;
     void d() {
       for (; a; a++) {
         b += c;
         c = a;
       }
     }
2021-08-20 21:24:28 +01:00
Daniel Paoliello 8ecce69594 Fix SEH table addresses for Windows
Issue Details:
The addresses for SEH tables for Windows are incorrect as 1 was unconditionally being added to all addresses. +1 is required for the SEH end address (as it is exclusive), but the SEH start addresses is inclusive and so should be used as-is.

In the IP2State tables, the addresses are +1 for AMD64 to handle the return address for a call being after the actual call instruction but are as-is for ARM and ARM64 as the `StateFromIp` function in the VC runtime automatically takes this into account and adjusts the address that is it looking up.

Fix Details:
* Split the `getLabel` function into two: `getLabel` (used for the SEH start address and ARM+ARM64 IP2State addresses) and `getLabelPlusOne` (for the SEH end address, and AMD64 IP2State addresses).

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D107784
2021-08-20 22:32:12 +03:00
Craig Topper 10020d41ee [TypePromotion] Remove unused IRBuilder object. NFC 2021-08-20 12:20:09 -07:00
Yonghong Song 430e223881 [DebugInfo] generate btf_tag annotations for DIDerived types
Generate btf_tag annotations for DIDrived types. More specifically,
clang frontend generates the btf_tag annotations for record
fields. The annotations are represented as an DINodeArray
in DebugInfo. The following example illustrate how
annotations are encoded in IR:
      distinct !DIDerivedType(tag: DW_TAG_member, ..., annotations: !10)
      !10 = !{!11, !12}
      !11 = !{!"btf_tag", !"a"}
      !12 = !{!"btf_tag", !"b"}

Differential Revision: https://reviews.llvm.org/D106616
2021-08-20 12:06:37 -07:00
Aditya Kumar b8e345b266 PR46874: Reset stack after visiting a node
When the stack is not reset it keeps previously visited Basic Block
which results in bugs where an instruction is hoisted to a
predecessor where the instruction was not fully anticipable.

Differential Revision: https://reviews.llvm.org/D108425
2021-08-20 11:25:05 -07:00
Arthur Eubanks d7df812740 [NFC] Cleanup/remove some AttributeList setter methods 2021-08-20 10:38:35 -07:00
Patrick Holland 3b3c01348b [MCA] Fixing bug that was causing LSUnit not to realize an instruction finished executing when the instruction has 0 latency.
Differential Revision: https://reviews.llvm.org/D108443
2021-08-20 10:32:37 -07:00
Arthur Eubanks 0f45c16f2c [NFC] Remove some unused functions 2021-08-20 09:46:30 -07:00
Andrea Di Biagio 35d4292a73 [X86][SchedModels] Fix missing ReadAdvance for MULX and ADCX/ADOX (PR51494)
Before this patch, instructions MULX32rm and MULX64rm were missing a ReadAdvance
for the implicit read of register EDX/RDX.  This patch fixes the issue, and it
also introduces a new SchedWrite for the two variants of MULX. The general idea
behind this last change is to eventually decrease the number of InstRW in the
scheduling models.

This patch also adds a ReadAdvance for the implicit read of EFLAGS in ADCX/ADOX.

Differential Revision: https://reviews.llvm.org/D108372
2021-08-20 17:39:51 +01:00
Sanjay Patel dd19f342fa [AggressiveInstCombine] guard against applying instruction flags with constant folding
This is a minimized version of a crash reported in:
D108201
2021-08-20 12:22:18 -04:00
Thomas Lively 88962cea46 [WebAssembly] Restore builtins and intrinsics for pmin/pmax
Partially reverts 85157c0079, which had removed these builtins and intrinsics
in favor of normal codegen patterns. It turns out that it is possible for the
patterns to be split over multiple basic blocks, however, which means that DAG
ISel is not able to select them to the pmin/pmax instructions. To make sure the
SIMD intrinsics generate the correct instructions in these cases, reintroduce
the clang builtins and corresponding LLVM intrinsics, but also keep the normal
pattern matching as well.

Differential Revision: https://reviews.llvm.org/D108387
2021-08-20 09:21:31 -07:00
Ben Shi 5b6c9a5ab0 [RISCV] Optimize add in the zba extension with SH*ADD
Optimize (add x, c) to (SH*ADD (c>>b), x) if c is not simm12
while (c>>b) is simm12 and c has b trailing zeros.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D108193
2021-08-20 22:41:49 +08:00
Kirill Stoimenov 05a8c0b5f8 [asan] Implemented getAddressSanitizerParams used by the ASan callback optimization code.
Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D108397
2021-08-20 14:17:07 +00:00
Jeremy Morse ce8254d096 [DebugInfo][InstrRef] Correctly ignore DBG_VALUE_LIST in InstrRef mode
This patch makes InstrRefBasedLDV "safe" to work with DBG_VALUE_LISTs. It
doesn't actually interpret them, but it recognises that they specify
variable locations and avoids propagating false locations, which is better
than the current state. Observe the attached tes

 * We avoid propagating DBG_VALUE_LISTs into successor blocks, as they're
   not "currently" supported,
 * We don't propagate other variable locations across DBG_VALUE_LISTs,
   because we know that the variable location is terminated by the
   DBG_VALUE_LIST.

Differential Revision: https://reviews.llvm.org/D108143
2021-08-20 14:51:02 +01:00
Simon Pilgrim 9efda541bf [CostModel][X86] Add costs for f32/f64 scalar and vector types.
The f16 half types are still pretty useless as we don't have it as a legal type (we treat them as i16 most of the time)
2021-08-20 14:31:12 +01:00
Simon Pilgrim c1f3bab23b MainSwitch::isValidSelectInst - don't dereference dyn_cast<> results.
We've already checked that the pointer isa<PHINode>, so we can use cast<Instruction> safely.

Fixes static analyser warning.
2021-08-20 14:31:11 +01:00
Jeremy Morse c76c24e40b [DebugInfo][InstrRef] Remove a faulty assertion
This patch removes an assertion, and adds a regression test showing why the
assertion is broken.

For context, LocIdx is a key/index number for machine locations, so that we
can describe locations as a single integer and ignore whether they're on
the stack, in registers or otherwise. Back when InstrRefBasedLDV was added,
I happened to bake in a "special" zero number for various reasons, which
Vedant identified as undesirable in this review comment:
https://reviews.llvm.org/D83047#inline-765495 . I subsequently removed that
special zero number, but it looks like I didn't delete this assertion at
the time, which assumes that a zero LocIdx is invalid.

The attached test shows that this assertion is reachable on valid code --
on x86 $rsp always gets the LocIdx number zero, and if you transfer a
variable value into it, InstrRefBasedLDV crashes on that assertion. The
code might be a bit wild to be storing variables to $rsp like that, however
we shouldn't crash on it.

Differential Revision: https://reviews.llvm.org/D108134
2021-08-20 14:23:32 +01:00
Alexander Potapenko 8dc7dcdca1 [msan] Add support for disable_sanitizer_instrumentation attribute
Unlike __attribute__((no_sanitize("memory"))), this one will cause MSan
to skip the entire function during instrumentation.

Depends on https://reviews.llvm.org/D108029

Differential Revision: https://reviews.llvm.org/D108199
2021-08-20 15:11:26 +02:00
Bjorn Pettersson d52f506192 [NewPM] Use parameterized syntax for a couple of more passes
A couple of passes that are parameterized in new-PM used different
pass names (in cmd line interface) while using the same pass class
name. This patch updates the PassRegistry to model pass parameters
more properly using PASS_WITH_PARAMS.

Reason for the change is to ensure that we have a 1-1 mapping
between class name and pass name (when disregarding the params).
With a 1-1 mapping it is more obvious which pass name to use in
options such as -debug-only, -print-after etc.

The opt -passes syntax is changed for the following passes:
  early-cse-memssa => early-cse<memssa>
  post-inline-ee-instrument => ee-instrument<post-inline>
  loop-extract-single => loop-extract<single>
  lower-matrix-intrinsics-minimal => lower-matrix-intrinsics<minimal>

This patch is not updating pass names in docs/Passes.rst. Not quite
sure what the status is for that document (e.g. when it comes to
listing pass paramters). It is only loop-extract-single that is
mentioned in Passes.rst today, out of the passes mentioned above.

Differential Revision: https://reviews.llvm.org/D108362
2021-08-20 14:59:21 +02:00
Alexander Potapenko b0391dfc73 [clang][Codegen] Introduce the disable_sanitizer_instrumentation attribute
The purpose of __attribute__((disable_sanitizer_instrumentation)) is to
prevent all kinds of sanitizer instrumentation applied to a certain
function, Objective-C method, or global variable.

The no_sanitize(...) attribute drops instrumentation checks, but may
still insert code preventing false positive reports. In some cases
though (e.g. when building Linux kernel with -fsanitize=kernel-memory
or -fsanitize=thread) the users may want to avoid any kind of
instrumentation.

Differential Revision: https://reviews.llvm.org/D108029
2021-08-20 14:01:06 +02:00
Simon Pilgrim 5d21ee4224 MemProfilerPass::run - remove (dead) duplicate return. NFC. 2021-08-20 12:36:28 +01:00
Tim Northover 3d41ef68e7 AArch64: don't form indexed paired ops if base reg overlaps operands.
The registers involved might not be identical, but can still overlap (e.g.
"str w0, [x0, #4]!").
2021-08-20 11:39:38 +01:00
Roman Lebedev 5d4f37e895
[NFCI][SimplifyCFG] Rewrite `createUnreachableSwitchDefault()`
The only thing that function should do as per it's semantic,
is to ensure that the switch's default is a block consisting only of
an `unreachable` terminator.

So let's just create such a block and update switch's default
to point to it. There should be no need for all this weird dance
around predecessors/successors.
2021-08-20 13:28:08 +03:00
Jingu Kang 94c4952951 [AArch64] Enable Upper bound unrolling universally
Differential Revision: https://reviews.llvm.org/D105996
2021-08-20 11:25:38 +01:00
Fraser Cormack 5b06cbac11 [RISCV] Fix reporting of incorrect commutable operand indices
This patch fixes an issue where RISCV's `findCommutedOpIndices` would
incorrectly return the pseudo `CommuteAnyOperandIndex` as a commutable
operand index, rather than fixing a specific index.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D108206
2021-08-20 10:27:15 +01:00
Kazu Hirata 49d7b2beae [DWARF] Remove parseListTableHeader (NFC)
The last use was removed on Oct 4, 2020 in commit
6d0be74af5.
2021-08-19 23:34:22 -07:00
Sebastian Neubauer f3fe44fa05 [AMDGPU] Fix too many constants with flat scratch
Prevent SIFoldOperands from creating SALU instructions with a constant
and a frame index. Previously, only one operand was checked to be a
frame index, leading to too many constants when flat scratch is enabled
and stack offsets are large.

Differential Revision: https://reviews.llvm.org/D108368
2021-08-20 08:21:36 +02:00
Lang Hames 4290d0fed0 [ORC] Add 'Async' suffix to ExecutorProcessControl::MemoryAccess methods.
This prevents the async methods (which shoud be overridden by subclasses) from
hiding the blocking helper methods, avoiding a lot of 'using MemoryAccess::...'
boilerplate.
2021-08-20 15:12:20 +10:00
Lang Hames 642885710e [ORC] Introduce lookupAndRecordAddrs utility.
Accepts a vector of (SymbolStringPtr, ExecutorAddress*) pairs, looks up all the
symbols, then writes their address to each of the corresponding
ExecutorAddresses.

This idiom (looking up and recording addresses into a specific set of variables)
is used in MachOPlatform and the (temporarily reverted) ELFNixPlatform, and is
likely to be used in other places in the near future, so wrapping it in a
utility function should save us some boilerplate.
2021-08-20 15:12:19 +10:00
Anton Afanasyev 3890ce708d [NFC][AggressiveInstCombine] Simplify code for shift truncation 2021-08-20 06:37:02 +03:00
Anshil Gandhi 508b06699a [Remarks] [AMDGPU] Emit optimization remarks for atomics generating hardware instructions
Produce remarks when atomic instructions are expanded into hardware instructions
in SIISelLowering.cpp. Currently, these remarks are only emitted for atomic fadd
instructions.

Differential Revision: https://reviews.llvm.org/D108150
2021-08-19 20:51:19 -06:00
Maryam Benimmar 2cdfd0b259 [AIX][XCOFF] 64-bit relocation reading support
Support XCOFFDumper relocation reading support
This patch is part of D103696 partition

Reviewed By: daltenty, Helflym

Differential Revision: https://reviews.llvm.org/D104646
2021-08-19 21:56:57 -04:00
Yonghong Song 0b32dca12e Reland [DebugInfo] generate btf_tag annotations for DIComposite types
Clang patch D106614 added attribute btf_tag support. This patch
generates btf_tag annotations for DIComposite types.
A field "annotations" is introduced to DIComposite, and the
annotations are represented as an DINodeArray, similar to
DIComposite elements. The following example illustrates
how annotations are encoded in IR:
  distinct !DICompositeType(..., annotations: !10)
  !10 = !{!11, !12}
  !11 = !{!"btf_tag", !"a"}
  !12 = !{!"btf_tag", !"b"}
Each btf_tag annotation is represented as a 2D array of
meta strings. Each record may have more than one
btf_tag annotations, as in the above example.

Reland with additional fixes for llvm/unittests/IR/DebugTypeODRUniquingTest.cpp.

Differential Revision: https://reviews.llvm.org/D106615
2021-08-19 17:33:50 -07:00
Jessica Paquette 3207ed196c [GlobalISel] Add IRTranslator support for @llvm.lround.* -> G_LROUND
Translate the `@llvm.lround.*` family to G_LROUND via
`IRTranslator::translateSimpleIntrinsic`.

Differential Revision: https://reviews.llvm.org/D108418
2021-08-19 17:08:08 -07:00
Jessica Paquette 3118926483 [GlobalISel] Add a G_LROUND instruction
Meant to represent the `@llvm.lround.*` family.

Add the opcode, docs, and verification.

Differential Revision: https://reviews.llvm.org/D108417
2021-08-19 17:06:24 -07:00
Fangrui Song 77b435aaa1 Revert "[InstrProfiling] Make COFF use the ELF comdat scheme (drop link.exe compatibility)"
This reverts commit fbb8e772ec.

Accidentally pushed.
2021-08-19 16:42:57 -07:00
Amara Emerson 95ac3d15e9 [AArch64][GlobalISel] Add G_VECREDUCE fewerElements support for full scalarization.
For some reductions like G_VECREDUCE_OR on AArch64, we need to scalarize
completely if the source is <= 64b. This change adds support for that in
the legalizer. If the source has a pow-2 num elements, then we can do
a tree reduction using the scalar operation in the individual elements.
Otherwise, we just create a sequential chain of operations.

For AArch64, we only need to scalarize if the input is <64b. If it's great than
64b then we can first do a fewElements step to 64b, taking advantage of vector
instructions until we reach the point of scalarization.

I also had to relax the verifier checks for reductions because the intrinsics
support <1 x EltTy> types, which we lower to scalars for GlobalISel.

Differential Revision: https://reviews.llvm.org/D108276
2021-08-19 16:38:52 -07:00
Fangrui Song fbb8e772ec [InstrProfiling] Make COFF use the ELF comdat scheme (drop link.exe compatibility)
The COFF specific `DataReferencedByCode` complexity (D103372 D103717) is due to
a link.exe limitation: an external symbol in IMAGE_COMDAT_SELECT_ASSOCIATIVE is
not really dropped, so it can cause duplicate definition error.
2021-08-19 16:38:32 -07:00
Amara Emerson a0051f7149 [AArch64][GlobalISel] Fix miscompile of <16 x s8> G_EXTRACT_VECTOR_ELT.
When support for copying vector s8 lanes was added recently, this also
had the side effect of fixing a fallback for <16 x s8> extracts since
both used the same helper. However, there was a bug in another helper
to get the regclass for a specific FPR-native type, which was assigning
FPR16 to s8 instead of FPR8.
2021-08-19 16:22:32 -07:00
Thomas Lively be6c49e743 [WebAssembly] Add explicit casts to silence -Wc++11-narrowing 2021-08-19 16:00:07 -07:00
Yonghong Song c1169b8bd3 Revert "[DebugInfo] generate btf_tag annotations for DIComposite types"
This reverts commit 2fded193e7.

Builtbot reports some test failures. Revert now so I can take time
to fix the issues.
2021-08-19 15:54:38 -07:00
Yonghong Song 2fded193e7 [DebugInfo] generate btf_tag annotations for DIComposite types
Clang patch D106614 added attribute btf_tag support. This patch
generates btf_tag annotations for DIComposite types.
A field "annotations" is introduced to DIComposite, and the
annotations are represented as an DINodeArray, similar to
DIComposite elements. The following example illustrates
how annotations are encoded in IR:
  distinct !DICompositeType(..., annotations: !10)
  !10 = !{!11, !12}
  !11 = !{!"btf_tag", !"a"}
  !12 = !{!"btf_tag", !"b"}
Each btf_tag annotation is represented as a 2D array of
meta strings. Each record may have more than one
btf_tag annotations, as in the above example.

Differential Revision: https://reviews.llvm.org/D106615
2021-08-19 15:37:44 -07:00
Thomas Lively fd0557dbf1 [WebAssembly] More convert_low and promote_low codegen
The convert_low and promote_low instructions can widen the lower two lanes of a
four-lane vector, but we were previously scalarizing patterns that widened lanes
besides the low two lanes. The commit adds a shuffle to move the widened lanes
into the low lane positions so the convert_low and promote_low instructions can
be used instead of scalarizing.

Depends on D108266.

Differential Revision: https://reviews.llvm.org/D108341
2021-08-19 15:37:12 -07:00
Thomas Lively b311a040ef [WebAssembly] Pattern match SIMD convert_low and promote_low during ISel
Since the simplest DAG patterns for convert_low and promote_low instructions
involved v2i32, v2f32, v4i64, and v4f64 types, which are not legal in the
WebAssembly backend and would be eliminated by type legalization, we were
previously matching those patterns in a DAG combine before the type legalization
stage. However in cases where the vectors were wider than 128 bits, the patterns
we matched were not created until the type legalization stage when the wide
vectors were split up. Type legalization would continue to eliminate the illegal
types we were matching as well, so the code ended up scalarized.

To make the ISel for these instructions more robust, match the scalarized
patterns rather than the patterns containing illegal types. Add tests with
double-wide vectors to show that this works as intended.

Fixes PR51098.
Depends on D107502.

Differential Revision: https://reviews.llvm.org/D108266
2021-08-19 15:24:28 -07:00
Akira Hatanaka 898dc4590c Refactor inlineRetainOrClaimRVCalls. NFC
This is in preparation for committing https://reviews.llvm.org/D103000.
2021-08-19 14:55:45 -07:00
Arthur Eubanks 7c8206cd2a [NFC] Cleanup AttributeList::getStackAlignment()
So that we don't use a confusing index.
2021-08-19 14:21:40 -07:00
Alexandre Rames cd28003336 [Support] Update `MD5` to follow other hashes.
Introduce `StringRef final()` and `StringRef result()`.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D107781
2021-08-19 14:13:14 -07:00
Arthur Eubanks 44a3241f10 [NFC] Replace some attribute methods that use confusing indexes 2021-08-19 14:10:26 -07:00
Alexandre Rames 10a126325d [NFC][Support] Move `MD5` members in `InternalState`.
This prepares an update to follow other hashes.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D108388
2021-08-19 14:04:14 -07:00
Florian Mayer 73323c6eaa [hwasan] re-enable stack safety by default.
The failed assertion was fixed in D108337.

Reviewed By: vitalybuka, eugenis

Differential Revision: https://reviews.llvm.org/D108381
2021-08-19 21:11:24 +01:00
Adrian Prantl 1e586bcc3e Move function definition out-of-line to fix the modularized build (NFC) 2021-08-19 12:26:23 -07:00
Thomas Lively b69374ca58 [WebAssembly] Legalize vector types by widening
The default legalization of unsupported vector types is to promote the integers
in each lane, which leads to extra sign or zero extending and masking when
moving data into and out of vectors. Switch our preferred type legalization from
the default to vector widening, which keeps the data in the low lanes of the
vector rather than in the low bits of each lane. The unused high lanes can be
ignored.

Half-wide vectors are now loaded from memory into the low 64 bits of the v128
rather than spread out among the lanes. As a result, v128.load64_splat is a much
more common operation, so add new patterns to support it.

Differential Revision: https://reviews.llvm.org/D107502
2021-08-19 12:07:33 -07:00
Philip Reames 17b9cb1817 [runtimeunroll] Support multiple exits to latch exit w/prolog loop
This patch extends the runtime unrolling infrastructure to support unrolling a loop with multiple exiting blocks branching to the same exit block used by the latch. It intentionally does not include a cost model change to enable this functionality unless appropriate force flags are used.

This is the prolog companion to D107381. Since this was LGTMed, a problem with DT updating was reported against that patch.  I roled in the analogous fix here as it seemed obvious, and not worth re-review.

As an aside, our prolog form leaves a lot of potential value on the floor when there is an invariant load or invariant condition in the loop being runtime unrolled. We should probably consider a "required prolog" heuristic.  (Alternatively, maybe we should be peeling these cases more aggressively?)

Differential Revision: https://reviews.llvm.org/D108262
2021-08-19 11:43:52 -07:00
Stanislav Mekhanoshin 8d7d89b081 [AMDGPU] Add alias.scope metadata to lowered LDS struct
Alias analysis is unable to disambiguate accesses to the structure
fields without it unlike distinct variables. As a result we cannot
combine ds_read and ds_write operations in a case of any store in
between which always considered clobbering.

Differential Revision: https://reviews.llvm.org/D108315
2021-08-19 11:40:30 -07:00
Nikita Popov 8cf5b69f69 [GuardWidening] Preserve MemorySSA
As reported on https://bugs.llvm.org/show_bug.cgi?id=51020, the
guard widening pass doesn't preserve MemorySSA, so it can no
longer be scheduled in the same loop pass manager as LICM. However,
the loop-schedule.ll test indicates that this is supposed to work.

Fix this by preserving MemorySSA if available, as this seems to be
trivial in this case (we only need to drop the memory access for
the removed guards).

Differential Revision: https://reviews.llvm.org/D108386
2021-08-19 20:23:17 +02:00
Philip Reames 447256f22b [runtimeunroll] Fix reported DT verification error after 94d0914
In 94d0914, I added support for unrolling of multiple exit loops which have multiple exits reaching the latch.  Per reports on the review post commit, I'd missed updating the domtree for one case.  This fix addresses that ommission.

There's no new test as this is covered by existing tests with expensive verification turned on.
2021-08-19 11:06:17 -07:00
Tim Northover edab411ee6 AArch64: copy all parts of the mem operand across when combining a store
In particular we were dropping volatility, which can lead to unwanted
transformations.
2021-08-19 18:26:39 +01:00
Chang-Sun Lin, Jr 9cae598f8b
[InstCombine] Avoid folding GEPs across loop boundaries
Folding a GEP from outside to inside a loop will materialize an add where there wasn't an equivalent operation before. Check the containing loops before making this fold.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D107935
2021-08-19 20:03:44 +03:00
Arthur Eubanks 33d44b762e [OpaquePtr][Inline] Use byval type instead of pointee type
Reviewed By: #opaque-pointers, dblaikie

Differential Revision: https://reviews.llvm.org/D105711
2021-08-19 09:56:08 -07:00
Owen Anderson 06a4c85890 Use v16i8 rather than v2i64 as the VT for memset expansion on AArch64.
This allows the instruction selector to realize that it can directly
broadcast the low byte of the memset value, rather than replicating
it to a 64-bit GPR before broadcasting.

This fixes PR50985.

Differential Revision: https://reviews.llvm.org/D108354
2021-08-19 16:54:07 +00:00
Augie Fackler e59c88294b MemoryBuiltins: trailing , on collection literal
This was probably bugging more than is reasonable, but it makes merging
changes in this file slightly less annoying to have the trailing comma
here. I only noticed this because Rust is currently carrying a patch to
this file and it kept making life a little difficult.
2021-08-19 17:59:23 +02:00
Craig Topper 84cea602f9 Revert "[SelectionDAGBuilder] Compute and cache PreferredExtendType on demand."
This reverts commit add08c8741.

There was a compile time jump on tramp3d-v4 on https://llvm-compile-time-tracker.com/
Want to see if it goes away with this reverted.
2021-08-19 08:42:05 -07:00
David Green d10f23a25d [ISel] Expand saddsat and ssubsat via asr and xor
This changes the lowering of saddsat and ssubsat so that instead of
using:
  r,o = saddo x, y
  c = setcc r < 0
  s = c ? INTMAX : INTMIN
  ret o ? s : r
into using asr and xor to materialize the INTMAX/INTMIN constants:
  r,o = saddo x, y
  s = ashr r, BW-1
  x = xor s, INTMIN
  ret o ? x : r
https://alive2.llvm.org/ce/z/TYufgD

This seems to reduce the instruction count in most testcases across most
architectures. X86 has some custom lowering added to compensate for
cases where it can increase instruction count.

Differential Revision: https://reviews.llvm.org/D105853
2021-08-19 16:08:07 +01:00
Simon Pilgrim 9419729b6a [CostModel][X86] Add VPOPCNTDQ/BITALG ctpop costs
VPOPCNTDQ + BITALG add ctpop instructions for vXi64/vXi32 + vXi16/vXi8 vector types respectively
2021-08-19 15:40:09 +01:00
Craig Topper add08c8741 [SelectionDAGBuilder] Compute and cache PreferredExtendType on demand.
Previously we pre-calculated this and cached it for every
instruction in the function. Most of the calculated results will
never be used. So instead calculate it only on the first use, and
then cache it.

The cache was originally added to fix a compile time issue which
caused r216066 to be reverted.

This change exposed that we weren't pre-computing the Value for
Arguments. I've explicitly disabled that for now as it seemed to
regress some tests on AArch64 which has sext built into its compare
instructions.

Spotted while investigating how to improve heuristics to work better
with RISCV preferring sign extend for unsigned compares for i32 on RV64.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D107976
2021-08-19 07:18:33 -07:00
Craig Topper c60a4c1ba5 [TypePromotion] Use Instruction* instead of Value* for a couple functions. NFC
This matches how they are called and allows some isa/cast/dyn_cast
to be removed.

Differential Revision: https://reviews.llvm.org/D108333
2021-08-19 07:09:38 -07:00
Craig Topper 36d8316cc8 [RISCV] Reduce duplicate code for calling SimplifyDemandedBits.
This encapsulates the APInt creation and worklist management into
a helper function.

To keep one common interface I've use Log2_32 in places that
previously created a mask by subtracting 1 from a power of 2.

Differential Revision: https://reviews.llvm.org/D108324
2021-08-19 07:09:38 -07:00
Alexey Lapshin ab9d506be3 [DWARF][Verifier][NFC] Use reference to DWARFAddressRangesVector to avoid copying.
Avoid copying while access to RangesOrError.get().
2021-08-19 16:23:05 +03:00
Sanjay Patel ec54e275f5 Revert "[CVP] processSwitch: Remove default case when switch cover all possible values."
This reverts commit 9934a5b2ed.
This patch may cause miscompiles because it missed a constraint
as shown in the examples from:
https://llvm.org/PR51531
2021-08-19 08:43:51 -04:00
Sanjay Patel eee0ded337 [InstCombine] add min/max intrinsics as freely invertible candidates
In the optimized test, we are able to peak through the
min/max that has 2 min/max operands and invert them all:
https://alive2.llvm.org/ce/z/7gYMN5
2021-08-19 08:41:38 -04:00
Sanjay Patel e10c3beca5 [InstCombine] add one-use check for min/max fold with not operands; NFC
This makes the intrinsic logic match the cmp+select idiom folds
just below. It's not clearly a win either way unless we think
that a 'not' op costs more than min/max.

The cmp+select folds on these patterns are more extensive than
the intrinsics currently and may have some complicated interactions,
so I'm trying to make those line up and bring the optimizations
for intrinsics up to parity.
2021-08-19 08:41:38 -04:00
Rosie Sumpter d1aa075129 [LoopFlatten] Fix assertion failure
There is an assertion failure in computeOverflowForUnsignedMul
(used in checkOverflow) due to the inner and outer trip counts
having different types. This occurs when the IV has been widened,
but the loop components are not successfully rediscovered.
This is fixed by some refactoring of the code in findLoopComponents
which identifies the trip count of the loop.

Differential Revision: https://reviews.llvm.org/D108107
2021-08-19 13:18:57 +01:00
Fraser Cormack e6b1ac8546 [LegalizeTypes][VP] Add widening support for binary VP ops
This patch adds the beginnings of more thorough support in the
legalizers for vector-predicated (VP) operations.

The first step is the ability to widen illegal vectors. The more
complicated scenario in which the result/operands need widening but the
mask doesn't has not been handled here. That would require a lot of code
without an in-tree target on which to test it.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D107904
2021-08-19 13:08:47 +01:00
Matthew Devereau 734708e04f [AArch64][SVE] Teach cost model that masked loads/stores are cheap
Reduce the cost of VLS masked loads/stores to make the vectorizor emit them more frequently.
2021-08-19 13:01:33 +01:00
Frederik Gossen c20cb5547d Avoid unused variable when NDEBUG 2021-08-19 13:00:16 +02:00
Bjorn Pettersson 36d5138619 [NewPM] Make some sanitizer passes parameterized in the PassRegistry
Refactored implementation of AddressSanitizerPass and
HWAddressSanitizerPass to use pass options similar to passes like
MemorySanitizerPass. This makes sure that there is a single mapping
from class name to pass name (needed by D108298), and options like
-debug-only and -print-after makes a bit more sense when (despite
that it is the unparameterized pass name that should be used in those
options).

A result of the above is that some pass names are removed in favor
of the parameterized versions:
- "khwasan" is now "hwasan<kernel;recover>"
- "kasan" is now "asan<kernel>"
- "kmsan" is now "msan<kernel>"

Differential Revision: https://reviews.llvm.org/D105007
2021-08-19 12:43:37 +02:00
Andrzej Warzynski dcc6b7b1d5 [OptTable] Refine how `printHelp` treats empty help texts
Currently, `printHelp` behaves differently for options that:
  * do not define `HelpText` (such options _are not printed_), and
  * define its `HelpText` as `HelpText<"">` (such options _are printed_).
In practice, both approaches lead to no help text and `printHelp` should
treat them consistently. This patch addresses that by making
`printHelpt` check the length of the help text to be printed.

All affected tests have been updated accordingly. The option definitions
for llvm-cvtres have been updated with a short description or "Not
  implemented" for options that are ignored by the tool.

Differential Revision: https://reviews.llvm.org/D107557
2021-08-19 09:30:15 +00:00