Commit Graph

162738 Commits

Author SHA1 Message Date
Sanjay Patel d16989607b [InstCombine] reduce code duplication in visitBranchInst(); NFCI 2022-10-18 11:34:02 -04:00
Florian Hahn a8e9742bd4
[IndVarSimplify] Clear block and loop dispositions after moving instr.
Moving an instruction can invalidate the cached block dispositions of
the corresponding SCEV. Invalidate the cached dispositions.

Also fixes a copy-paste error in forgetBlockAndLoopDispositions where
the start expression S was removed from BlockDispositions in the loop
but not the current values. This was also exposed by the new test case.

Fixes #58439.
2022-10-18 16:18:14 +01:00
Nikita Popov d06131fda2 [AST] Pass BatchAA to mergeSetIn() (NFCI) 2022-10-18 16:54:55 +02:00
Krzysztof Parzyszek 9fde8e907b [Hexagon] Fix MULHS lowering for HVX v60
The carry bit from an intermediate addition was not properly propagated.
For example mulhs(7fffffff, 7fffffff) was evaluated as 3ffeffff, while
the correct result is 3fffffff.
2022-10-18 07:54:38 -07:00
Alexey Bataev e79532d28c [SLP][NFC]Try to fix MSVC buildbots with a workaround, NFC. 2022-10-18 07:50:10 -07:00
uabkaka da137d041b [SimplifyLibCalls] Add NoUndef/NonNull/Dereferenceable attributes to iprintf/siprintf
When SimplifyLibCalls fail to optimize printf and sprintf it add
NoUndef/NonNull/Dereferenceable attributes. This patch add the same attributes
if SimplifyLibCalls optimize printf/sprintf into the integer only
iprintf/siprintf.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D136140
2022-10-18 16:36:35 +02:00
Alexey Bataev 6a6fc4890d [SLP][NFC]Formatting of the getEntryCost function, NFC. 2022-10-18 07:18:26 -07:00
Florian Hahn e302fa89aa
[LoopUnroll] Forget exit values when making changes.
When unrolling, the exit values in LCSSA phis will get updated.
Invalidate cached SCEV values for those phis in case SCEV looked through
a exit phi.

Fixes #58340.
2022-10-18 15:12:24 +01:00
Anton Afanasyev e175f99c49 Revert "[MachineCombiner][RISCV] Enable MachineCombiner for RISCV"
This reverts commit 3112cf3b00.
Test breakage: https://lab.llvm.org/buildbot/#/builders/16/builds/36631
2022-10-18 15:57:11 +03:00
Weining Lu 9572406bbc [LoongArch] Fix codegen of atomicrmw nand
Fix invalid RISCV-like MI being emitted for performing the `not`
operation: the LoongArch `xori` zero-extends the immediate, hence is
not equivalent to RISCV `xori`. The LoongArch `not` is a `nor` with
zero.

Differential Revision: https://reviews.llvm.org/D136021
2022-10-18 20:39:20 +08:00
Anton Sidorenko 3112cf3b00 [MachineCombiner][RISCV] Enable MachineCombiner for RISCV
Initial implementation to match basic FP reassociation patterns.

Differential Revision: https://reviews.llvm.org/D135264
2022-10-18 15:31:03 +03:00
Carlos Alberto Enciso b6625765cf [llvm-debuginfo-analyzer] Fix linking errors in buildbots.
The tool used the 'old' LLVM build information (LLVMBuild.txt),
which caused linking errors in:

https://lab.llvm.org/buildbot/#/builders/177/builds/10125
https://lab.llvm.org/buildbot/#/builders/196/builds/19699

Update the CMake configuration to support the new LLVM build
system that uses only CMakeLists.txt.

Reviewed By: jryans

Differential Revision: https://reviews.llvm.org/D136159
2022-10-18 13:05:46 +01:00
Nikita Popov e162a73e41 [CFG] Add const qualifier to isPotentiallyReachableFromMany() (NFC)
Accept a const pointer for StopBB. Unfortunately the worklist has
to use non-const pointers due to LoopInfo interaction.
2022-10-18 10:06:07 +02:00
Carlos Alberto Enciso c28a977b87 Recommit [llvm-debuginfo-analyzer] (02/09) - Driver and documentation
Originally committed in fe7a3cedf7

Reverted in 26dd64ba9c

Buildbot failures:
https://lab.llvm.org/buildbot#builders/139/builds/29663
- unittest trigger an invalid assertion.

https://lab.llvm.org/buildbot#builders/196/builds/19665
- 'has virtual functions but non-virtual destructor' warning as error.

Recommitted with fix:
- Removed the assertion.
- Added virtual destructor.
2022-10-18 08:39:26 +01:00
Dominik Adamski ccd314d320 [OpenMP][OMPIRBuilder] Add generation of SIMD align assumptions to OMPIRBuilder
Currently generation of align assumptions for OpenMP simd construct is done
outside OMPIRBuilder for C code and it is not supported for Fortran.

According to OpenMP 5.0 standard (2.9.3) only pointers and arrays can be
aligned for C code.

If given aligned variable is pointer, then Clang generates the following set
of the LLVM IR isntructions to support simd align clause:

; memory allocation for pointer address:
%A.addr = alloca ptr, align 8
; some LLVM IR code
; Alignment instructions (alignment is equal to 32):
%0 = load ptr, ptr %A.addr, align 8
call void @llvm.assume(i1 true) [ "align"(ptr %0, i64 32) ]

If given aligned variable is array, then Clang generates the following set
of the LLVM IR isntructions to support simd align clause:

; memory allocation for array:
%B = alloca [10 x i32], align 16
; some LLVM IR code
; Alignment instructions (alignment is equal to 32):
%arraydecay = getelementptr inbounds [10 x i32], ptr %B, i64 0, i64 0
call void @llvm.assume(i1 true) [ "align"(ptr %arraydecay, i64 32) ]

OMPIRBuilder was modified to generate aligned assumptions. It generates only
llvm.assume calls. Frontend is responsible for generation of aligned pointer
and getting the default alignment value if user does not specify it in aligned
clause.

Unit and regression tests were added to check if aligned clause was handled correctly.

Differential Revision: https://reviews.llvm.org/D133578

Reviewed By: jdoerfert
2022-10-18 02:04:18 -05:00
Max Kazantsev f884a4c957 [NFC] Reuse NonTrivialUnswitchCandidate instead of std::pair 2022-10-18 14:00:53 +07:00
LiaoChunyu 7b970290c0 [RISCV] Optimize SELECT_CC when the true value of select is Constant
(select (setcc lhs, rhs, CC), constant, falsev) -> (select (setcc lhs, rhs, InverseCC), falsev, constant)

This patch removes unnecessary copies

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D129757
2022-10-18 09:24:17 +08:00
Koakuma d3fcbee10d [SPARC] Make calls to function with big return values work
Implement CanLowerReturn and associated CallingConv changes for SPARC/SPARC64.

In particular, for SPARC64 there's new `RetCC_Sparc64_*` functions that handles the return case of the calling convention.
It uses the same analysis as `CC_Sparc64_*` family of funtions, but fails if the return value doesn't fit into the return registers.

This makes calls to functions with big return values converted to an sret function as expected, instead of crashing LLVM.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D132465
2022-10-18 00:01:55 +00:00
Arthur Eubanks 308b4bca14 [NFC][SROA] Update comment to use opaque pointers for clarity 2022-10-17 16:37:29 -07:00
Daniel Sanders 021e6e05d3 [instsimplify] Move (extelt (inselt Vec, Value, Index), Index) -> Value from InstCombine
As requested in https://reviews.llvm.org/D135625#3858141

Differential Revision: https://reviews.llvm.org/D136099
2022-10-17 15:22:06 -07:00
Xiang Li 13163dd8ab [HLSL] CodeGen hlsl resource binding.
''register(ID, space)'' like register(t3, space1) will be translated into
i32 3, i32 1 as the last 2 operands for resource annotation metadata.

NamedMetadata for CBuffers and SRVs are added as "hlsl.srvs" and "hlsl.cbufs".

Reviewed By: beanz

Differential Revision: https://reviews.llvm.org/D130951
2022-10-17 14:29:19 -07:00
Craig Topper 2b32e4f98b [RISCV] Add basic support for the sifive-7-series short forward branch optimization.
sifive-7-series has macrofusion support to convert a branch over
a single instruction into a conditional instruction. This can be
an improvement if the branch is hard to predict.

This patch adds support for the most basic case, a branch over a
move instruction. This is implemented as a pseudo instruction so
we can hide the control flow until all code motion passes complete.

I've disabled a recent select optimization if this feature is enabled
in the subtarget.

Related gcc patch for the same optimization https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg211045.html

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D135814
2022-10-17 13:56:22 -07:00
Matthias Braun 6d972ad2d8 ControlHeightReduction: Remove assert check in shouldApply
Remove assertion checking for non-empty `ProfileSummaryInfo`.

Differential Revision: https://reviews.llvm.org/D133706
2022-10-17 13:10:13 -07:00
Florian Hahn 6db71b8f14
[ConstraintElim] Use helper to allow overflow for coefficients of GEPs
If the arithmetic for indices of inbounds GEPs overflows, the result is
poison. This means it is also OK for the coefficients to overflow. GEP
decomposition is limited to cases where the index size is <= 64 bit,
which can be represented by int64_t used for the coefficients in the
constraint system.
2022-10-17 20:30:43 +01:00
Han Zhu d0d48a91f8 [X86] Lower vector interleave into unpck and perm
[This Godbolt link](https://godbolt.org/z/s17Kv1s9T) shows different codegen between clang and gcc for a transpose operation.

clang result:
```
        vmovdqu xmm0, xmmword ptr [rcx + rax]
        vmovdqu xmm1, xmmword ptr [rcx + rax + 16]
        vmovdqu xmm2, xmmword ptr [r8 + rax]
        vmovdqu xmm3, xmmword ptr [r8 + rax + 16]
        vpunpckhbw      xmm4, xmm2, xmm0
        vpunpcklbw      xmm0, xmm2, xmm0
        vpunpcklbw      xmm2, xmm3, xmm1
        vpunpckhbw      xmm1, xmm3, xmm1
        vmovdqu xmmword ptr [rdi + 2*rax + 48], xmm1
        vmovdqu xmmword ptr [rdi + 2*rax + 32], xmm2
        vmovdqu xmmword ptr [rdi + 2*rax], xmm0
        vmovdqu xmmword ptr [rdi + 2*rax + 16], xmm4
```
gcc result:
```
        vmovdqu ymm3, YMMWORD PTR [rdi+rax]
        vpunpcklbw      ymm1, ymm3, YMMWORD PTR [rsi+rax]
        vpunpckhbw      ymm0, ymm3, YMMWORD PTR [rsi+rax]
        vperm2i128      ymm2, ymm1, ymm0, 32
        vperm2i128      ymm1, ymm1, ymm0, 49
        vmovdqu YMMWORD PTR [rcx+rax*2], ymm2
        vmovdqu YMMWORD PTR [rcx+32+rax*2], ymm1
```
clang's code is roughly 15% slower than gcc's when evaluated on an internal compression benchmark.

The loop vectorizer generates the following shufflevector intrinsic:
```
%interleaved.vec = shufflevector <32 x i8> %a, <32 x i8> %b, <64 x i32> <i32 0, i32 32, i32 1, i32 33, i32 2, i32 34, i32 3, i32 35, i32 4, i32 36, i32 5, i32 37, i32 6, i32 38, i32 7, i32 39, i32 8, i32 40, i32 9, i32 41, i32 10, i32 42, i32 11, i32 43, i32 12, i32 44, i32 13, i32 45, i32 14, i32 46, i32 15, i32 47, i32 16, i32 48, i32 17, i32 49, i32 18, i32 50, i32 19, i32 51, i32 20, i32 52, i32 21, i32 53, i32 22, i32 54, i32 23, i32 55, i32 24, i32 56, i32 25, i32 57, i32 26, i32 58, i32 27, i32 59, i32 28, i32 60, i32 29, i32 61, i32 30, i32 62, i32 31, i32 63>
```
which is lowered to SelectionDAG:
```
t2: v32i8,ch = CopyFromReg t0, Register:v32i8 %0
t6: v64i8 = concat_vectors t2, undef:v32i8
t4: v32i8,ch = CopyFromReg t0, Register:v32i8 %1
t7: v64i8 = concat_vectors t4, undef:v32i8
t8: v64i8 = vector_shuffle<0,64,1,65,2,66,3,67,4,68,5,69,6,70,7,71,8,72,9,73,10,74,11,75,12,76,13,77,14,78,15,79,16,80,17,81,18,82,19,83,20,84,21,85,22,86,23,87,24,88,25,89,26,90,27,91,28,92,29,93,30,94,31,95> t6, t7
```

So far this `vector_shuffle` is good enough for us to pattern-match and transform, but as we go down the SelectionDAG pipeline, it got split into smaller shuffles. During dagcombine1, the shuffle is split by `foldShuffleOfConcatUndefs`.
```
  // shuffle (concat X, undef), (concat Y, undef), Mask -->
  // concat (shuffle X, Y, Mask0), (shuffle X, Y, Mask1)
t2: v32i8,ch = CopyFromReg t0, Register:v32i8 %0
t4: v32i8,ch = CopyFromReg t0, Register:v32i8 %1
t19: v32i8 = vector_shuffle<0,32,1,33,2,34,3,35,4,36,5,37,6,38,7,39,8,40,9,41,10,42,11,43,12,44,13,45,14,46,15,47> t2, t4
t15: ch,glue = CopyToReg t0, Register:v32i8 $ymm0, t19
t20: v32i8 = vector_shuffle<16,48,17,49,18,50,19,51,20,52,21,53,22,54,23,55,24,56,25,57,26,58,27,59,28,60,29,61,30,62,31,63> t2, t4
t17: ch,glue = CopyToReg t15, Register:v32i8 $ymm1, t20, t15:1
```

With `foldShuffleOfConcatUndefs` commented out, the vector is still split later by the type legalizer, which comes after dagcombine1, because v64i8 is not a legal type in AVX2 (64 * 8 = 512 bits while ymm = 256 bits). There doesn't seem to be a good way to avoid this split. Lowering the `vector_shuffle` into unpck and perm during dagcombine1 is too early. Therefore, although somewhat inconvenient, we decided to go with pattern-matching a pair vector shuffles later in the SelectionDAG pipeline, as part of `lowerV32I8Shuffle`.

The code looks at the two operands of the first shuffle it encounters, iterates through the users of the operands, and tries to find two shuffles that are consecutive interleaves. Once the pattern is found, it lowers them into unpcks and perms. It returns the perm for the shuffle that's currently being lowered (have ISel modify the DAG), and replaces the other shuffle in place.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D134477
2022-10-17 11:39:27 -07:00
Sjoerd Meijer 5b9597f59a Recommit "[LoopFlatten] Enable it by default"
The sanitizer bots turned green again after another change went in, i.e.
revert 26dd64ba9c, so I don't think this
patch was causing the problems.
2022-10-17 23:27:19 +05:30
Craig Topper 30305d7948 [TargetLowering][RISCV][Sparc] Don't emit zero check in CTTZTableLookup for CTTZ_ZERO_UNDEF.
The code incorrectly checked for CTLZ_ZERO_UNDEF instead of
CTTZ_ZERO_UNDEF.

While I was there I flipped the condition into an early out.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D136010
2022-10-17 10:15:39 -07:00
Kazu Hirata ef9956f434 [IR] Rename FuncletPadInst::getNumArgOperands to arg_size (NFC)
This patch renames FuncletPadInst::getNumArgOperands to arg_size for
consistency with CallBase, where getNumArgOperands was removed in
favor of arg_size in commit 3e1c787b31

Differential Revision: https://reviews.llvm.org/D136048
2022-10-17 10:15:10 -07:00
Fangrui Song 5d3139aef1 [AArch64] Fix warnings 2022-10-17 16:58:52 +00:00
Sjoerd Meijer a71c4e4fbb Revert "[LoopFlatten] Enable it by default"
This reverts commit 233659c7ae.

I see some sanitizer build bot failures. Not sure if it is change
causing it, but let's see if a revert returns the bots to green...
2022-10-17 22:14:20 +05:30
Mingming Liu db0286a096 [AArch64]Enhance 'isBitfieldPositioningOp' to find pattern (shl(and(val,mask), N).
Before this patch (and D135844)

- Given DAG node shl(op, N), isBitfieldPositioningOp uses (optionally shifted [1] ) op as the Src (least significant bits of Src are inserted into DstLSB of Dst node).

After this patch

- If op is and(val, mask), isBitfieldPositioningOp tries to see through and and find if val is a simpler source than op.

It helps in a similar (probably symmetric) way how isSeveralBitsExtractOpFromShr [2] optimizes isBitfieldExtractOpFromShr

Existing test cases are improved without regressions.

[1] cbd8464595/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp (L2546)
[2] cbd8464595/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp (L2057)

Differential Revision: https://reviews.llvm.org/D135850
2022-10-17 09:01:29 -07:00
Simon Pilgrim 8e77458578 [DAG] visitShiftByConstant - replace constant detection with FoldConstantArithmetic
Instead of checking that an operand is constant/opaque before calling getNode() and then checking that the result is a constant, just use FoldConstantArithmetic which will just early-out if the operands are not constant foldable.
2022-10-17 16:19:10 +01:00
Mingming Liu 45cadb4bd3 [AArch64][NFC]Refactor 'isBitfieldPositioningOp' so that DAG nodes with different Opcode are handled with separate helper functions.
Using different helper functions for DAG nodes with different Opcode allows specialization.

- 'isBitfieldExtractOp' [1] shows how specialization based on Opcode could catch more patterns.
- The refactor paves the way (e.g., makes diff clearer) for enhancement in {D135844,D135850,D135852}

[1] cbd8464595/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp (L2163-L2202)

Differential Revision: https://reviews.llvm.org/D135843
2022-10-17 08:08:48 -07:00
Sanjay Patel 8d76fbb5f0 [VectorCombine] fix crashing on match of non-canonical fneg
We can't assume that operand 0 is the negated operand because
the matcher handles "fsub -0.0, X" (and also +0.0 with FMF).

By capturing the extract within the match, we avoid the bug
and make the transform more robust (can't assume that this
pass will only see canonical IR).
2022-10-17 10:47:48 -04:00
Nicola Lancellotti 43fe14c056 [AArch64] Canonicalize ZERO_EXTEND to VSELECT
Differential Revision: https://reviews.llvm.org/D135596
2022-10-17 15:42:46 +01:00
Simon Pilgrim af5942cc09 Remove trailing whitespace. NFC. 2022-10-17 15:20:26 +01:00
Nikita Popov 779fd39684 Reapply [InstCombine] Switch foldOpIntoPhi() to use InstSimplify
Relative to the previous attempt, this is rebased over the
InstSimplify fix in ac74e7a780,
which addresses the miscompile reported in PR58401.

-----

foldOpIntoPhi() currently only folds operations into the phi if all
but one operands constant-fold. The two exceptions to this are freeze
and select, where we allow more general simplification.

This patch makes foldOpIntoPhi() generally simplification based and
removes all the instruction-specific logic. We just try to simplify
the instruction for each operand, and for the (potentially) one
non-simplified operand, we move it into the new block with adjusted
operands.

This fixes https://github.com/llvm/llvm-project/issues/57448, which
was my original motivation for the change.

Differential Revision: https://reviews.llvm.org/D134954
2022-10-17 16:11:05 +02:00
Nikita Popov ac74e7a780 [InstSimplify] Only check self-simplify in simplifyInstruction()
InstSimplify currently checks whether the instruction simplifies
back to itself, and returns undef in that case. Generally, this
should only occur in unreachable code.

However, this was also done for the simplifyInstructionWithOperands()
API. In that case, the instruction only serves as a template that
provides the opcode and other non-operand data. In this case,
simplifying back to the same "instruction" may be expected. This
caused PR58401 in conjunction with D134954.

As such, move this check into simplifyInstruction() only. The only
other caller of simplifyInstructionWithOperands() also handles the
self-simplification case explicitly.
2022-10-17 15:52:38 +02:00
Carlos Alberto Enciso 26dd64ba9c Revert "[llvm-debuginfo-analyzer] (02/09) - Driver and documentation"
This reverts commit fe7a3cedf7.
2022-10-17 14:26:48 +01:00
Carlos Alberto Enciso fe7a3cedf7 [llvm-debuginfo-analyzer] (02/09) - Driver and documentation
llvm-debuginfo-analyzer is a command line tool that processes debug
info contained in a binary file and produces a debug information
format agnostic “Logical View”, which is a high-level semantic
representation of the debug info, independent of the low-level
format.

The code has been divided into the following patches:

1) Interval tree
2) Driver and documentation
3) Logical elements
4) Locations and ranges
5) Select elements
6) Warning and internal options
7) Compare elements
8) ELF Reader
9) CodeView Reader

Full details:
https://discourse.llvm.org/t/llvm-dev-rfc-llvm-dva-debug-information-visual-analyzer/62570

This patch:

Driver and documentation
- Command line options.
- Full documentation.
- String Pool table.

Reviewed By: psamolysov, probinson

Differential Revision: https://reviews.llvm.org/D125777
2022-10-17 13:46:55 +01:00
Florian Hahn 699396131f
Revert "Reapply [InstCombine] Switch foldOpIntoPhi() to use InstSimplify"
This reverts commit 333246b48e.

It looks like this patch causes a mis-compile:
https://github.com/llvm/llvm-project/issues/58401

Fixes #58401.
2022-10-17 12:56:28 +01:00
Sjoerd Meijer 233659c7ae [LoopFlatten] Enable it by default
LoopFlatten has been in the code base off by default for years, but this
enables it to run by default. Downstream this has been running for
years, so it has been exposed to quite some code. Then around the time
we switched to the NPM, several fixes went in related to updating the
MemorySSA state and we moved it to a loop pass manager, which both
helped preventing rerunning certain analysis passes, and thus helped a
bit with compile-times.

About compile-times, adding a pass isn't free, but this should see only
very minor increases. The pass is relatively simple and there shouldn't
be anything algorithmically expensive because all it does is looking at
inner/outer loops and it checks assumptions on loop increments and
indices. If we see increases, I expect this to mainly come from
invalidation of analysis info, and perhaps subsequent passes to trigger
and do more. Despite its simplicity/restrictions, it triggers in most
code-bases, which makes it worth to enable this by default.

Differential Revision: https://reviews.llvm.org/D109958
2022-10-17 17:11:39 +05:30
Nathan Sidwell d3b10150b6 [demangler] Simplify OutputBuffer initialization
Every non-testcase use of OutputBuffer contains code to allocate an
initial buffer (using either 128 or 1024 as initial guesses). There's
now no need to do that, given recent changes to the buffer extension
heuristics -- it allocates a 1k(ish) buffer on first need.

Just pass in a buffer (if any) to the constructor.  Thus the
OutputBuffer's ownership of the buffer starts at its own lifetime
start. We can reduce the lifetime of this object in several cases.

That new constructor takes a 'size_t *' for the size argument, as all
uses with a non-null buffer are passing through a malloc'd buffer from
their own caller in this manner.

The buffer reset member function is never used, and is deleted.

Some adjustment to a couple of uses is needed, due to the lazy buffer
creation of this patch.

a) the Microsoft demangler can demangle empty strings to nothing,
which it then memoizes.  We need to avoid the UB of passing nullptr to
memcpy.

b) a unit test checks insertion of no characters into an empty buffer.
We need to avoid UB when converting that to std::string.

The original buffer initialization code would return a failure code if
that first malloc failed.  Existing code either ignored that, called
std::terminate with a FIXME, or returned an error code.

But that's not foolproof anyway, as a subsequent buffer extension
failure ends up calling std::terminate. I am working on addressing
that unfortunate failure mode in a manner more consistent with the C++
ABI design.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D122604
2022-10-17 04:23:16 -07:00
Nikita Popov 436fb27186 [BasicAA] Support loop phis in pointsToConstantMemory()
When looking for underlying objects, if we encounter one that we
have already seen, then we should skip it (as it has already been
checked) rather than bail out. In particular, this adds support
for the case where we have a loop use of a phi recurrence.
2022-10-17 12:34:55 +02:00
Chuanqi Xu 1cedc51ff5 [Coroutines] Don't merge readnone calls in presplit coroutines
Another alternative to fix the thread identification problem in
coroutines.

We plan to fix this problem by unifying memory effecting attributes. See
https://discourse.llvm.org/t/rfc-unify-memory-effect-attributes/65579.
But it may be a long-term project. And it is a pity that the coroutines
can't resume in different threads for years. So this one is temporary
fix. It may cause unnecessary performance regression for coroutines. But
correctness are more important. And this one is planned to be reverted
after we are able to unify the memory effecting attributes actually.

Reviewed By: jdoerfert, rjmccall

Differential Revision: https://reviews.llvm.org/D135550
2022-10-17 10:22:43 +08:00
Kazu Hirata 5ea3155565 [llvm] Use llvm::find (NFC) 2022-10-16 16:21:00 -07:00
Florian Hahn 462ab9810d
[ConstraintElim] Fix signed integer overflow for inbounds GEP.
For inbounds GEPs, signed overflow yields poison, so it is fine for the
coefficients to wrap as well. This fixes an UBSan failure.
2022-10-16 23:25:28 +01:00
Florian Hahn aec0c1009f
[ConstraintElim] Replace custom GEP index handling by using existing code
Instead of duplicating the existing decomposition code for GEP indices
just use the existing code by calling the existing decompose function on
the index expression and multiply the result's coefficients by the scale of
the index.

This both reduces code duplication and generalizes the pattern we can
handle.
2022-10-16 21:53:11 +01:00
Florian Hahn a4635ec710
[ConstraintElim] Support `add nsw` for unsigned preds with positive ops.
If both operands of an `add nsw` are known positive, it can be treated
the same as `add nuw` and added to the unsigned system.

https://alive2.llvm.org/ce/z/6gprff
2022-10-16 20:25:14 +01:00
Kazu Hirata 7820a30a1b [AMDGPU] Use llvm::any_of (NFC) 2022-10-16 09:19:09 -07:00
Sanjay Patel e5ee0b06d6 [InstCombine] try to determine "exact" for sdiv
If the divisor is a power-of-2 or negative-power-of-2 and the dividend
is known to have >= trailing zeros than the divisor, the division is exact:
https://alive2.llvm.org/ce/z/UGBksM (general proof)
https://alive2.llvm.org/ce/z/D4yPS- (examples based on regression tests)

This isn't the most direct optimization (we could create ashr in these
examples instead of relying on existing folds for exact divides), but
it's possible that there's a more general constraint than just a pow2
divisor, so this might be extended in the future.

This should solve issue #58348.

Differential Revision: https://reviews.llvm.org/D135970
2022-10-16 10:59:56 -04:00
Sanjay Patel 340ae45be0 [InstCombine] use isKnownNonNegative() for readability; NFCI
This should be functionally equivalent - both calls are thin
wrappers around computeKnownBits(). We'll probably want to use
known-bits directly in follow-up patches because that could
determine "exact" for example (see issue #58348).
2022-10-16 10:59:56 -04:00
Jan Sjodin dd3d8ddb5f [OpenMP][OpenMPIRBuilder] Migrate OffloadEntriesInfoManager from clang to OMPIRbuilder
This patch moves the implementation of the OffloadEntriesInfoManager
to the OMPIRbuilder. This class will later be used by flang as well.

    Reviewed By: jdoerfert

    Differential Revision: https://reviews.llvm.org/D135786
2022-10-16 08:32:40 -04:00
Amara Emerson 13792ba417 [AArch64][GlobalISel] When lowering signext i1 parameters, don't zero-extend to s8 first.
Fixes https://github.com/llvm/llvm-project/issues/57181
2022-10-15 20:25:43 -07:00
Peter Rong c2e7c9cb33 [CodeGen] Using ZExt for extractelement indices.
In https://github.com/llvm/llvm-project/issues/57452, we found that IRTranslator is translating `i1 true` into `i32 -1`.
This is because IRTranslator uses SExt for indices.

In this fix, we change the expected behavior of extractelement's index, moving from SExt to ZExt.
This change includes both documentation, SelectionDAG and IRTranslator.
We also included a test for AMDGPU, updated tests for AArch64, Mips, PowerPC, RISCV, VE, WebAssembly and X86

This patch fixes issue #57452.

Differential Revision: https://reviews.llvm.org/D132978
2022-10-15 15:45:35 -07:00
Kazu Hirata 1b97645e56 [ADT] Introduce StringRef::{starts,ends}_width{,_insensitive}
This patch introduces:

  StringRef::starts_with
  StringRef::starts_with_insensitive
  StringRef::ends_with
  StringRef::ends_with_insensitive

to be more compatible with std::string and std::string_view.

I'm planning to deprecate the existing functions in favor of the new
ones.

Differential Revision: https://reviews.llvm.org/D136030
2022-10-15 15:06:37 -07:00
Kazu Hirata b2f41e9ac1 [Vectorize] Use std::conditional_t (NFC) 2022-10-15 14:52:25 -07:00
Florian Hahn 7c1b80e35c
[ConstraintElim] Support unsigned decomposition of mul/shl nuw..const
Support decomposition for `mul/shl nuw` with constant operand for unsigned
queries. Those expressions should not wrap in the unsigned sense and can
be added directly to the unsigned system.
2022-10-15 21:28:08 +01:00
Kazu Hirata f3a76f0581 [Object] Fix a warning
This patch fixes:

  llvm/lib/Object/XCOFFObjectFile.cpp:1001:20: warning: suggest
  parentheses around ‘&&’ within ‘||’ [-Wparentheses]
2022-10-15 12:43:12 -07:00
Florian Hahn f12684d36e
[ConstraintElim] Support signed decomposition of `add nsw`.
Add support decomposition for `add nsw` for signed queries.
`add nsw` won't wrap and can be directly added to the signed
system.
2022-10-15 18:34:03 +01:00
wanglei 506e936871 [LoongArch] Fix wrong VariantKind for MO_GOT_PC_{HI/LO} flags
Differential Revision: https://reviews.llvm.org/D135946
2022-10-15 17:45:08 +08:00
Alexander Shaposhnikov 25915c6ad2 [objcopy][MachO] Clean up Section ctors, NFC 2022-10-15 00:53:52 +00:00
Kazushi (Jam) Marukawa 0278c9ceb6 [VE] Change the way to lower select
Change to use VEISD::CMOV in combineSelect for better optimization.
Support VEISD::CMOV in combineTRUNCATE also to optimize trancate.
Merge functions to handle condition codes to VE.h.  And add basic
CMOV patterns to VEInstrInfo.td.  Update regression tests also.

Reviewed By: efocht

Differential Revision: https://reviews.llvm.org/D135878
2022-10-15 08:49:36 +09:00
Krzysztof Parzyszek fb063ea2ea [Hexagon] Clean up leftover instructions in HvxIdioms
Quick and dirty fix, because this is causing one builder to fail.
2022-10-14 16:45:03 -07:00
Krzysztof Parzyszek 6cb2a02a38 [Hexagon] Report if changes were made in HvxIdioms pass
This should fix
```
Pass modifies its input and doesn't report it: Hexagon Vector Combine
Pass modifies its input and doesn't report it UNREACHABLE executed at
[...hecks-debian/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1436!
```
2022-10-14 15:46:33 -07:00
Keith Smiley c2d209476c
[llvm-objcopy][MachO] Add support for LC_DYLIB_CODE_SIGN_DRS
This allows binaries containing the LC_DYLIB_CODE_SIGN_DRS to be
objcopy'd and stripped.

Differential Revision: https://reviews.llvm.org/D135988
2022-10-14 15:41:19 -07:00
Zequan Wu 82035ec777 Revert "[PGO] Make emitted symbols hidden"
This reverts commit ecac223b0e.

The commit causes instrprof-darwin-dead-strip.c to fail on mac.
2022-10-14 15:23:26 -07:00
Krzysztof Parzyszek 361a27c155 [Hexagon] Recognize idioms for fixed-point vector multiplication
Recognize Q.15*Q.15 and Q.31*Q.31, with and without rounding.
2022-10-14 15:22:25 -07:00
Martin Storsjö 6eb205b257 Reapply [AArch64] Fix aligning the stack after calling __chkstk
Whenever a call to __chkstk was made, the frame lowering previously
omitted the aligning (as NumBytes was reset to zero before doing
alignment).

This fixes https://github.com/llvm/llvm-project/issues/56182.

The initial version of this produced invalid code for small
functions with no local stack allocations, if those functions
were marked with the "stackrealign" attribute. If building
with -mstack-alignment=16 (which otherwise mostly would be a
no-op), this attribute is added on the main function.

Differential Revision: https://reviews.llvm.org/D135687
2022-10-15 00:40:13 +03:00
Krzysztof Parzyszek b465a98316 [Hexagon] Fix isTypeForHVX for vector predicates
HexagonSubtarget::isTypeFixHVX would stop breaking the type up when it
reached 64 bits in width. HVX vector predicates can be shorter than that,
for example <32 x i1> would have a bitwidth of 32, and it's still a valid
HVX type.
2022-10-14 14:38:41 -07:00
Krzysztof Parzyszek 705e77abed [Hexagon] Lower funnel shifts for HVX
HVX v62+ has bidirectional shifts, which do not mask the shift amount to
the bit width. Instead, the shift amount is sign-extended from the log(BW)
bit value, and a negative value causes a shift in the other direction.
For the shift amount being -log(BW), this reversed shift will shift all
bits out, inserting 0s or sign bits depending on the type and direction.
2022-10-14 14:13:18 -07:00
Florian Hahn 16cf666bb7
[Loop] Move block and loop dispo invalidation to makeLoopInvariant.
makeLoopInvariant may recursively move its operands to make them
invariant, before moving the passed in instruction. Those recursively
moved instructions are currently missed when invalidating block and loop
dispositions.

To address this, move the invalidation code to Loop::makeLoopInvariant.

Fixes #58314.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D135909
2022-10-14 21:58:14 +01:00
serge-sans-paille 232e0a011e
[lto] Do not try to internalize symbols with escaped name
Because of LLVM mangling escape sequence (through '\01' prefix), it is possible
for a single symbols two have two different IR representations.

For instance, consider @symbol and @"\01_symbol". On OSX, because of the system
mangling rules, these two IR names point are converted in the same final symbol
upon linkage.

LTO doesn't model this behavior, which may result in symbols being incorrectly
internalized (if all reference use the escaping sequence while the definition
doesn't).

The proper approach is probably to use the mangled name to compute GUID to
avoid the dual representation, but we can also avoid discarding symbols that are
bound to two different IR names. This is an approximation, but it's less
intrusive on the codebase.

Fix #57864

Differential Revision: https://reviews.llvm.org/D135710
2022-10-14 22:34:17 +02:00
Zain Jaffal 0c8dde551c [ConstraintElimination] Move logic for replacing ssub overflow users (NFC)
Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D134044
2022-10-14 21:14:21 +01:00
Filipp Zhinkin ef774bec63 [AArch64] Support SETCCCARRY lowering
Support SETCCCARRY lowering to SBCS instruction.

Related issue: https://github.com/llvm/llvm-project/issues/44629

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D135302
2022-10-14 22:29:31 +03:00
Argyrios Kyrtzidis d877e3fe71 [Transforms/ObjCARC] Fix non-deterministic output of `ObjCARCOptPass`
`ProvenanceAnalysis::related()` was assuming that the order of parameters for `relatedCheck()` was not affecting
the result but this was not the case when both parameters were `PHINode`s.
Due to this assumption `ProvenanceAnalysis::related()` was ordering the parameters based on pointer value which resulted in
non-deterministic behavior.

To address this change `relatedPHI()` so that it gives the same result independent of the parameter order.

rdar://100325456

Differential Revision: https://reviews.llvm.org/D135376
2022-10-14 12:26:58 -07:00
Craig Topper 1fab0ac559 [RISCV] Rename ReadVIALUCV->ReadVICALUV to match WriteVICALUV. NFC 2022-10-14 12:11:55 -07:00
Krzysztof Parzyszek e8375e3042 [Hexagon] Use IRBuilderBase in function parameters
This will allow using builders with different folders.
2022-10-14 12:10:59 -07:00
Krzysztof Parzyszek 7f4ce3f1eb [Hexagon] Introduce PS_vsplat[ir][bhw] pseudo instructions
HVX v60 only has splats that take a 32-bit word as input, while v62+
has splats that take 8- or 16-bit value. This makes writing output
patterns that need to use a splat annoying, because the entire output
pattern needs to be replicated for various versions of HVX.
To avoid this, the patterns will always use the pseudos, and then the
pseudos will be handled using a post-ISel hook.
2022-10-14 12:03:13 -07:00
Craig Topper d3366efd43 [LV] Simplify register usage code and avoid double map lookups. NFC
Instead of checking whether a map entry exists to decide if we should
initialize it or add to it, we can rely on the map entry being constructed
and initialized to 0 before the addition happens.

For the std::max case, I've made a reference to the map entry to
avoid looking it up twice.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D135977
2022-10-14 11:55:48 -07:00
Chris Bieneman 911d2dc230 [NFC] [HLSL] Move common metadata to LLVMFrontend
This change pulls some code from the DirectX backend into a new
LLVMFrontendHLSL library to share utility data structures between the
HLSL code generation in Clang and the backend in LLVM.

This is a small refactoring as a first start to get code into the
right structure and get the library built and dependencies correct.

Fixes #58000 (https://github.com/llvm/llvm-project/issues/58000)

Reviewed By: python3kgae

Differential Revision: https://reviews.llvm.org/D135110
2022-10-14 13:40:04 -05:00
Craig Topper 44f0b13494 [RISCV] Correct RISCVTTIImpl::getRegUsageForType for vectors of pointers.
getPrimitiveSizeInBits returns 0 for pointers, we need to query
the size via DataLayout instead.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D135976
2022-10-14 11:34:12 -07:00
Chris Bieneman e530a1188e [DX] Add pass to pretty-print DXIL metadata in asm
When DXC prints IR output it adds a bunch of IR comments in a header
that describe the DXIL metadata in a more human-readable format. This
pass will serve that purpose for LLVM by printing out ahead of the IR
printer.

Reviewed By: python3kgae

Differential Revision: https://reviews.llvm.org/D135802
2022-10-14 13:32:59 -05:00
Caroline Concatto 60e2aad109 [AArch64]Change printVectorList to print SVE vector range
This patch has the prefered disassembly changed for SVE vector list.
For instance, instead of printing this assembly:
  ld4d { z1.d, z2.d, z3.d, z4.d }, p0/z, [x0]
it will print this:
  ld4d { z1.d-z4.d }, p0/z, [x0]

Differential Revision: https://reviews.llvm.org/D135952
2022-10-14 18:59:56 +01:00
David Green de6dfbbb30 [ARM] Fix for MVE i128 vector icmp costs.
We were hitting an assert as the legalied type needn't be a vector.

Fixes #58364
2022-10-14 18:49:25 +01:00
Hassnaa Hamdi 2c72d90ecc [AArch64-SVE]: Force generating code compatible to streaming mode.
Add a compile-time flag for enabling streaming mode.
When streaming mode is enabled, lower basic loads and stores of fixed-width vectors;
to generate code that is compatible to streaming mode.

Differential Revision: https://reviews.llvm.org/D133433
2022-10-14 17:46:56 +00:00
Angelo Matni ccde601f14 Fix llvm/lib/ObjCopy, llvm/llvm-ifs: c++20 compatibility
Cleanup: avoid referring to `std::vector<T>` members when `T` is incomplete.

This is [not legal](https://timsong-cpp.github.io/cppwp/n4868/vector#overview-4)
according to the C++ standard, and causes build errors in particular in C++20
mode. Fix it by defining the vector's type before using the vector.

Reviewed By: saugustine, MaskRay

Differential Revision: https://reviews.llvm.org/D135906
2022-10-14 10:28:46 -07:00
chenglin.bi c1909d7337 [DAGCombiner] Fix crash for the merge stores with different value type
The crash case comes from #58350. It have two stores, one store is type f32 and the other is v1f32.
When we try to merge these two stores on v1f32, the memVT is vector type so the old code will use ISD::EXTRACT_SUBVECTOR for type f32 also then compiler crash.
So this patch insert a build_vector for f32 store to generate v1f32 also when memVT is v1f32.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D135954
2022-10-15 01:16:35 +08:00
Nicola Lancellotti ce1a2ccf94 [NFC] Fix typo in DAGCombiner 2022-10-14 17:47:25 +01:00
Dmitry Preobrazhensky bf96703fb3 [AMDGPU][MC][GFX8+] Correct v_cndmask modifiers
Correct v_cndmask_b32 to support abs/neg modifiers in dpp/sdwa/e64 variants.
Correct v_cndmask_b16 for proper disassembly of abs/neg modifiers in e64_dpp variants.

Differential Revision: https://reviews.llvm.org/D135900
2022-10-14 19:37:27 +03:00
Sander de Smalen 02df03c5b7 [AArch64][SME] Add support for arm_locally_streaming functions.
Functions with `aarch64_sme_pstatesm_body` will emit a SMSTART at the start
of the function, and a SMSTOP at the end of the function, such that all
operations use the right value for vscale.

Because the placement of these nodes is critically important (i.e. no
vscale-dependent operations should be done before SMSTART has been issued),
we require glueing the CopyFromReg to the Entry node such that we can
insert the SMSTART as part of that glued chain.

More details about the SME attributes and design can be found
in D131562.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D131582
2022-10-14 13:47:53 +00:00
chenglin.bi 85e41fcaac [AArch64] Select to CCMN when the CCMP's second operator is negative constant
CCMP/CCMN's second operator support const from 0 to 31. When the CCMP's second operator is in the range [-31, -1] we can replace it with CCMN to avoid extra mov.

Fix: #57034

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D135939
2022-10-14 21:41:25 +08:00
Florian Hahn 5a68e578ca
[ConstraintElim] Add debug message when decomposition fails. 2022-10-14 11:02:05 +01:00
Martin Storsjö f309f095e7 Revert "[AArch64] Fix aligning the stack after calling __chkstk"
This reverts commit 50e0aced45.

This could accidentally start producing invalid code in some
cases (in particular, if compiling with -mstack-alignment=16, which
one could expect to be a no-op for a target where the stack always
is aligned to 16 bytes anyway).
2022-10-14 11:55:59 +03:00
Benjamin Kramer 08dc847f33 Add missing `override`s after aad013de41 2022-10-14 10:38:32 +02:00
Nikita Popov 237b962031 [BasicAA] Account for cycles when checking for same select condition
If we have translated across a cycle backedge, the same SSA value
for the condition might be referring to two different loop iterations.
Use the isValueEqualInPotentialCycles() helper to avoid assuming
equality in that case.
2022-10-14 10:37:40 +02:00
Nikita Popov 03f9d0ff22 [TBAA] Model call accessing immutable type as readnone
Accesses to constant memory are not observable and should be
reported as readnone, not readonly. This is consistent with what
we do for normal (non-call) instructions: For those, the TBAA
metadata will result in pointsToConstantMemory() returning true,
which will then result in a NoModRef result, not a Ref result.

Differential Revision: https://reviews.llvm.org/D135864
2022-10-14 10:08:37 +02:00
gonglingqin e632bb6543 [LoongArch] Add codegen support for atomicrmw umin/umax operation on LA64
Furthermore, use `beqz $rd, .BB` instead of `beq $rd, $zero, .BB`.

Differential Revision: https://reviews.llvm.org/D135525
2022-10-14 15:24:43 +08:00
Leon Clark 6370bc2435 Add f16 nearbyint support.
Enable lowering of FNEARBYINT for f16 and extend existing tests.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D135124
2022-10-14 08:05:24 +01:00
Matt Arsenault d0750ec475 AtomicExpand: Avoid some operations if the atomic is overaligned
Let some of the pointer bithacking fold away if we know the LSB are 0.
2022-10-13 23:31:00 -07:00