Commit Graph

160922 Commits

Author SHA1 Message Date
Nikita Popov a21c245307 [ARMParallelDSP] Remove unnecessary ModRef intersection (NFC)
Intersecting with ModRef is a no-op, as these are the only two
possible values.
2022-08-01 08:34:58 +02:00
Nikita Popov 5b1d10bda6 [AA] Drop setModAndRef() function (NFC)
Without the "must" state, this function is pointless, because we
can just directly create a ModRef instead.
2022-08-01 07:55:39 +02:00
Nikita Popov 34683c3e35 [MSSA] Fix expensive checks build 2022-08-01 07:28:52 +02:00
Nikita Popov f96ea53e89 [AA] Do not track Must in ModRefInfo
getModRefInfo() queries currently track whether the result is a
MustAlias on a best-effort basis. The only user of this functionality
is the optimized memory access type in MemorySSA -- which in turn
has no users. Given that this functionality has not found a user
since it was introduced five years ago (in D38862), I think we
should drop it again.

The context is that I'm working to separate FunctionModRefBehavior
to track mod/ref for different location kinds (like argmem or
inaccessiblemem) separately, and the fact that ModRefInfo also has
an unrelated Must flag makes this quite awkward, especially as this
means that NoModRef is not a zero value. If we want to retain the
functionality, I would probably split getModRefInfo() results into
a part that just contains the ModRef information, and a separate
part containing a (best-effort) AliasResult.

Differential Revision: https://reviews.llvm.org/D130713
2022-08-01 07:14:31 +02:00
Chuanqi Xu 9701053517 Introduce @llvm.threadlocal.address intrinsic to access TLS variable
This belongs to a series of patches which try to solve the thread
identification problem in coroutines. See
https://discourse.llvm.org/t/address-thread-identification-problems-with-coroutine/62015
for a full background.

The problem consists of two concrete problems: TLS variable and readnone
functions. This patch tries to convert the TLS problem to readnone
problem by converting the access of TLS variable to an intrinsic which
is marked as readnone.

The readnone problem would be addressed in following patches.

Reviewed By: nikic, jyknight, nhaehnle, ychen

Differential Revision: https://reviews.llvm.org/D125291
2022-08-01 10:51:30 +08:00
Kazu Hirata bf6021709a Use drop_begin (NFC) 2022-07-31 15:17:09 -07:00
Kazu Hirata d11103f9a0 [Hexagon] Remove unused declaration adjustForCalleeSavedRegsSpillCall (NFC)
The function definition was removed on Apr 23, 2015 in commit
876a19d855, but the declaration has
remained since.
2022-07-31 15:17:06 -07:00
Kazu Hirata 71638b8be7 [ExecutionEngine] Ensure newlines at the end of files (NFC) 2022-07-31 15:16:58 -07:00
Luís Marques 260a641068 [RISCV] Pre-RA expand pseudos pass
Expand load address pseudo-instructions earlier (pre-ra) to allow follow-up
patches to fold the addi of PseudoLLA instructions into the immediate
operand of load/store instructions.

Differential Revision: https://reviews.llvm.org/D123264
2022-07-31 23:19:00 +02:00
Sanjay Patel 02b3a35892 [InstSimplify] fold FP rounding intrinsic with rounded operand
issue #56775

I rearranged the Thumb2 codegen test to avoid simplifying the chain
of rounding instructions. I'm assuming the intent of the test is
to verify lowering of each of those intrinsics.
2022-07-31 10:00:27 -04:00
Simon Pilgrim acb5abb7d3 [X86] getFauxShuffleMask - use DemandedElts variant of getTargetShuffleInputs. NFCI.
We don't specify the demanded elts yet, this patch just rewires the getTargetShuffleInputs calls and gives an "all demanded elts" mask.
2022-07-31 12:15:04 +01:00
Simon Pilgrim 9cdba33337 [X86] combineX86ShufflesRecursively - determine demanded elts to pass to getTargetShuffleInputs
Only PACKSS/PACKUS faux shuffles make use of the demanded elts at the moment, but this at least improves the handling of a couple of truncation patterns.
2022-07-31 11:30:40 +01:00
Sunho Kim c559072e46 [JITLink][COFF] Remove unused variable. 2022-07-31 09:19:17 +09:00
Sunho Kim b501770aef [JITLink][COFF] Handle COMDAT symbol with offset.
Handles COMDAT symbol with an offset and refactor the code to only generated symbol if the second symbol was encountered. This happens very infrequently but happens in recursive_mutex implementation of MSVC STL library.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D130454
2022-07-31 09:09:48 +09:00
Sunho Kim d86f903b1d [JITLink][COFF][x86_64] Implement remaining IMAGE_REL_AMD64_REL32_*.
Implements remaining IMAGE_REL_AMD64_REL32_*. We only need IMAGE_REL_AMD64_REL32_4 for now but doing all remaining ones for completeness. (clang only uses IMAGE_REL_AMD64_REL32_1 and IMAGE_REL_AMD64_REL32)

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D130452
2022-07-31 09:03:28 +09:00
Sunho Kim e781451140 [JITLink] Relax zero-fill edge assertions.
Relax zero-fill edge assertions to only consider relocation edges. Keep-alive edges to zero-fill blocks can cause this assertion which is too strict.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D130450
2022-07-31 08:34:10 +09:00
Sunho Kim ee9cf336d6 [JITLink][COFF] Remove obsolete FIXMEs. (NFC) 2022-07-31 08:10:14 +09:00
Sunho Kim ea75c25833 [JITLInk][COFF] Remove unnecessary unique_ptr. (NFC) 2022-07-31 08:08:19 +09:00
Sunho Kim 067faddb55 [JITLink][COFF] Add explicit std::move.
Since ArgList is not copyable we need to make sure it's moved explicitly.
2022-07-31 08:01:00 +09:00
Sunho Kim 88181375a3 [JITLink][COFF] Implement include/alternatename linker directive.
Implements include/alternatename linker directive. Alternatename is used by static msvc runtime library. Alias symbol is technically incorrect (we have to search for external definition) but we don't have a way to represent this in jitlink/orc yet, this is solved in the following up patch.

Inlcude linker directive is used in ucrt to forcelly lookup the static initializer symbols so that they will be emitted. It's implemented as extenral symbols with live flag on that cause the lookup of these symbols.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D130276
2022-07-31 07:49:59 +09:00
Craig Topper d21b315360 [RISCV] Remove vmerges from vector ceil, floor, trunc lowering.
Use masked operations to suppress spurious exception bits being
set in fflags. Unfortunately, doing this adds extra copies.
2022-07-30 10:58:41 -07:00
Simon Pilgrim df457f583a [X86] Use std::tie so we can have more meaningful variable names for demanded bits/elts pairs. NFCI.
*.first + *.second were proving difficult to keep track of.
2022-07-30 18:57:15 +01:00
Kazu Hirata 12b29900a1 Use any_of (NFC) 2022-07-30 10:35:56 -07:00
Kazu Hirata 60db8d9b4e Use nullptr instead of 0 (NFC)
Identified with modernize-use-nullptr.
2022-07-30 10:35:48 -07:00
Kazu Hirata 66b6cc3acd [ExecutionEngine] Ensure a newline at the end of a file (NFC) 2022-07-30 10:35:43 -07:00
Craig Topper a23f07fb1d [RISCV] Add merge operands to more RISCVISD::*_VL opcodes.
This adds a merge operand to all of the binary _VL nodes. Including
integer and widening. They all share multiclasses in tablegen
so doing them all at once was easiest.

I plan to use FADD_VL in an upcoming patch. The rest are just for
consistency to keep tablegen working.

This does reduce the isel table size by about 25k so that's nice.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D130816
2022-07-30 10:26:38 -07:00
Craig Topper 9bf305fe2b [RISCV] Swap the merge and mask operand order for VRGATHER*_VL and FCOPYSIGN_VL nodes.
Based on review feedback from D130816.
2022-07-30 09:57:05 -07:00
Simon Pilgrim a14f94c20c [X86] computeKnownBitsForTargetNode - out of range X86ISD::VSRAI doesn't fold to zero
Noticed by inspection and I can't seem to make a test case, but SSE arithmetic bit shifts clamp to the max shift amount (i.e. create a sign splat) - combineVectorShiftImm already does something similar.
2022-07-30 17:55:39 +01:00
Dmitry Vassiliev adc387460d [CodeGen] Fixed undeclared MISchedCutoff in case of NDEBUG and LLVM_ENABLE_ABI_BREAKING_CHECKS
This patch fixes the error llvm/lib/CodeGen/MachineScheduler.cpp(755): error C2065: 'MISchedCutoff': undeclared identifier in case of NDEBUG and LLVM_ENABLE_ABI_BREAKING_CHECKS.
Note MISchedCutoff is declared under #ifndef NDEBUG.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D130425
2022-07-30 18:24:50 +02:00
Sanjay Patel 7073ec530e [InstCombine] canonicalize more zext-and-of-bool compare to narrow and
https://alive2.llvm.org/ce/z/vBNiiM

This matches variants of patterns that were folded with:
b5a9361c90
2022-07-30 11:22:05 -04:00
Austin Kerbow 7898426a72 [AMDGPU] Remove unused function 2022-07-30 07:47:35 -07:00
Simon Pilgrim 49c0980eac Fix Wdocumentation warning. NFC.
warning: '\returns' command used in a comment that is attached to a function returning void
2022-07-30 15:41:13 +01:00
Simon Pilgrim 813459ed2b [X86] combineSelect fold 'smin' style pattern select(pcmpgt(RHS, LHS), LHS, RHS) -> select(pcmpgt(LHS, RHS), RHS, LHS) if pcmpgt(LHS, RHS) already exists
Avoids repeated commuted comparisons when we're performing min/max and clamp patterns
2022-07-30 15:31:36 +01:00
Nuno Lopes d4b4747de5 ConstantFolding: fold OOB accesses to poison instead of undef 2022-07-30 15:20:32 +01:00
Sanjay Patel f95a6aea1b [InstCombine] avoid splitting a constant expression with div/rem fold
Follow-up to d4940c0f3d to further limit the transform
to avoid an unintended pattern/fold of a constant expression.
2022-07-30 09:45:25 -04:00
Simon Pilgrim 9ad082eb5a [DAG] Pull out repeated getOperand() calls for shuffle ops. NFC. 2022-07-30 14:02:54 +01:00
Simon Pilgrim 276480b1d3 [AMDGPU] Fix || vs && precedence warning. NFC. 2022-07-30 14:02:54 +01:00
Nuno Lopes fffabd5348 [NFC] Switch a few uses of undef to poison as placeholders for unreachable code 2022-07-30 13:55:56 +01:00
Alexander Shaposhnikov 4220ef2be1 [InstCombine] Add fold for redundant sign bits count comparison
For power-of-2 C:
((X s>> ShiftC) ^ X) u< C --> (X + C) u< (C << 1)
((X s>> ShiftC) ^ X) u> (C - 1) --> (X + C) u> ((C << 1) - 1)

(https://github.com/llvm/llvm-project/issues/56479)

Test plan:
0/ ninja check-llvm check-clang + bootstrap LLVM/Clang
1/ https://alive2.llvm.org/ce/z/eEUfx3

Differential revision: https://reviews.llvm.org/D130433
2022-07-30 09:06:53 +00:00
Carl Ritson 4c4db81630 [AMDGPU] Extend SILoadStoreOptimizer to s_load instructions
Apply merging to s_load as is done for s_buffer_load.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D130742
2022-07-30 11:38:39 +09:00
Alexander Shaposhnikov d982f1e0c6 [InstCombine] Refactor foldICmpMulConstant
This is a follow-up to 2ebfda2417
(replace "if" with "else if" since the cases nuw/nsw
were meant to be handled separately).

Test plan:
1/ ninja check-llvm check-clang check-lld
2/ Bootstrapped LLVM/Clang pass tests
2022-07-30 02:29:15 +00:00
Fangrui Song ce6dd4e835 Revert D130458 "[llvm-objcopy] Support --{,de}compress-debug-sections for zstd"
This reverts commit c26dc2904b.

The new Zstd dispatch has an ongoing design discussion related to https://reviews.llvm.org/D130516#3688123 .
Revert for now before it is resolved.
2022-07-29 15:46:51 -07:00
Sanjay Patel d4940c0f3d [InstCombine] fix miscompile from urem/udiv transform with constant expression
The isa<Constant> check could misfire on an instruction with 2 constant
operands. This bug was introduced with bb789381fc (D36988).

See issue #56810 for a C source example that exposed the bug.
2022-07-29 17:14:30 -04:00
Craig Topper e637feee80 [RISCV] Add isel pattern for (setne/eq GPR, -2048)
For constants in the range [-2047, 2048] we use addi. If the constant
is -2048 we can use xori. If we don't match this explicitly, we'll
emit an LI for the -2048 followed by an XOR.
2022-07-29 14:07:38 -07:00
Jay Foad 9436a85eb6 [IRBuilder] Make createCallHelper a member function. NFC.
This just avoids explicitly passing in 'this' as an argument in a bunch
of places.

Differential Revision: https://reviews.llvm.org/D130752
2022-07-29 21:17:26 +01:00
Austin Kerbow 2c82a126d7 [AMDGPU] Omit unnecessary waitcnt before barriers
It is not necessary to wait for all outstanding memory operations before
barriers on hardware that can back off of the barrier in the event of an
exception when traps are enabled. Add a new subtarget feature which
tracks which HW has this ability.

Reviewed By: #amdgpu, rampitec

Differential Revision: https://reviews.llvm.org/D130722
2022-07-29 11:12:36 -07:00
Sanjay Patel b5a9361c90 [InstCombine] canonicalize zext-and-of-bool compare to narrow and
https://alive2.llvm.org/ce/z/3jYbEH

We should choose one of these forms, and the option that uses
the narrow type allows the motivating example from issue #56294
to reduce. In the best case (no 'not' needed and 'trunc' remains),
this does remove an instruction.

Note that there is what looks like a regression because there
is an existing canonicalization that turns trunc into and+icmp.
That is a long-standing transform, and I'm not sure what effect
reversing it would have.
2022-07-29 12:02:54 -04:00
Matt Devereau a8b726ac65 [AArch64][SVE] Change DupLane128Combine Index comparison to 0
IdxInsert == IdxDupLane is incorrect. IdxInsert is the starting element number,
whereas IdxIndex is the index of a quadword
2022-07-29 14:31:00 +00:00
Simon Pilgrim bc2c4f6c85 [X86] combineAndnp - constant fold ANDNP(C,X) -> AND(~C,X) (REAPPLIED)
If the LHS op has a single use then using the more general AND op is likely to allow commutation, load folding, generic folds etc.

Updated version - original version rG057db2002bb3 didn't correctly account for multiple uses of the mask that might be folding "OR(AND(X,C),AND(Y,~C)) -> OR(AND(X,C),ANDNP(C,Y))" in canonicalizeBitSelect
2022-07-29 15:12:26 +01:00
Nikita Popov 5eaeeed8cb [InstCombine] Avoid ConstantExpr::getFNeg() calls (NFCI)
Instead call the constant folding API, which can fail. For now,
this should be NFC, as we still allow the creation of fneg
constant expressions.
2022-07-29 16:01:46 +02:00
Amaury Séchet 226086230c [DAG] Use recursivelyDeleteUnusedNodes in CommitTargetLoweringOpt.
It simplifies the logic and removes the need for manual bookkeeping.

Differential Revision: https://reviews.llvm.org/D130445
2022-07-29 13:49:03 +00:00
Mirko Brkusanin 6a1aa627fa [AMDGPU] Enable image_gather4h instruction for gfx10 and gfx11
Differential Revision: https://reviews.llvm.org/D130764
2022-07-29 15:42:06 +02:00
Alexey Lapshin ece341f598 [Debuginfo][DWARF][NFC] Add paired methods working with DWARFDebugInfoEntry.
This review is extracted from D96035.

DWARF Debuginfo classes have two representations for DIEs: DWARFDebugInfoEntry
(short) and DWARFDie(extended). Depending on the task, it might be more convenient
to use DWARFDebugInfoEntry or/and DWARFDie. DWARFUnit class already has methods
working with DWARFDie and DWARFDebugInfoEntry. This patch adds more
methods working with DWARFDebugInfoEntry to have paired functionality.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D126059
2022-07-29 16:40:17 +03:00
Jay Foad 3cfa9b1431 [AMDGPU] user-sgpr-init16-bug does not apply to gfx1103
Differential Revision: https://reviews.llvm.org/D130347
2022-07-29 14:21:13 +01:00
Simon Pilgrim af1b7ebcdf [TargetLowering] Move a few hasOneUse() tests later to reduce unnecessary computations. NFC.
Many of these cases, an early-out on the much cheaper getOpcode() check will avoid us needing to call hasOneUse() entirely.
2022-07-29 14:20:35 +01:00
Matt Arsenault ef906f287e AMDGPU: Fix assertion when printing unreachable functions
Since 814a0abcce, this would break if we
had a function in the module that becomes dead in any codegen IR
pass. The function wasn't deleted since it was initially used in dead
code, but is detached from the call graph and doesn't appear in the PO
traversal. Do a second walk over the module to populate the resources
of any functions which weren't already processed.
2022-07-29 08:57:43 -04:00
Matt Arsenault a4834ad068 RegisterCoalescer: Shrink main range after shrinking subranges
If the subregister uses were dead, this would leave the main range
segment pointing to a deleted instruction.

Not sure if this should try to avoid shrinking if we know we don't
have dead components.
2022-07-29 08:57:28 -04:00
Alexander Timofeev d7ae1a9097 Revert "[AMDGPU] avoid blind converting to VALU REG_SEQUENCE and PHIs"
This reverts commit 76d9ae924c.
because it causes several VK CTS tests to fail
2022-07-29 14:19:07 +02:00
Simon Pilgrim 641dba9e28 [DAG] Move a few hasOneUse() tests later to reduce unnecessary computations. NFC.
Many of these cases, an early-out on the much cheaper getOpcode() check will avoid us needing to call hasOneUse() entirely.
2022-07-29 11:34:39 +01:00
Simon Pilgrim 9082c13106 [Support] Add KnownBits::concat method
Add a method for the various cases where we need to concatenate 2 KnownBits together (BUILD_PAIR and SHIFT_PARTS in particular) - uses the existing APInt::concat 'HiBits.concat(LoBits)' convention

Differential Revision: https://reviews.llvm.org/D130557
2022-07-29 11:06:39 +01:00
Jay Foad d03110155b [IR] Simplify Intrinsic::getDeclaration. NFC. 2022-07-29 10:45:22 +01:00
wanglei 56ab2f4ccd [LoongArch] Offset folding for frameindex
This patch is for frameindex calculations.

Differential Revision: https://reviews.llvm.org/D130248
2022-07-29 17:27:34 +08:00
wanglei fd6545322c [LoongArch] Refactor insertDivByZeroTrap
Ensure non-terminators don't follow terminators.
This patch fixes the `sdiv-udiv-srem-urem.ll` test failure with
expensive check.

Differential Revision: https://reviews.llvm.org/D130247
2022-07-29 17:06:49 +08:00
David Sherwood 487fa6f8c3 [AArch64][DAGCombine] Add performBuildVectorCombine 'extract_elt ~> anyext'
A build vector of two extracted elements is equivalent to an extract
subvector where the inner vector is any-extended to the
extract_vector_elt VT, because extract_vector_elt has the effect of an
any-extend.

  (build_vector (extract_elt_i16_to_i32 vec Idx+0) (extract_elt_i16_to_i32 vec Idx+1))
  => (extract_subvector (anyext_i16_to_i32 vec) Idx)

Depends on D130697

Differential Revision: https://reviews.llvm.org/D130698
2022-07-29 09:51:09 +01:00
Florian Hahn 214e2d8fe5
[SCEV] Avoid repeated proveNoSignedWrapViaInduction calls.
At the moment, proveNoSignedWrapViaInduction may be called for the
same AddRec a large number of times via getSignExtendExpr. This can have
a severe compile-time impact for very loop-heavy code.

If proveNoSignedWrapViaInduction failed to prove NSW the first time,
it is unlikely to succeed on subsequent tries and the cost doesn't seem
to be justified.

This is the signed version of 8daa338297 / D130648.

This can drastically improve compile-time in some excessive cases and
also has a slightly positive compile-time impact on CTMark:

NewPM-O3: -0.06%
NewPM-ReleaseThinLTO: -0.04%
NewPM-ReleaseLTO-g: -0.04%

https://llvm-compile-time-tracker.com/compare.php?from=8daa338297d533db4d1ae8d3770613eb25c29688&to=aed126a196e7a5a9803543d9b4d6bdb233d0009c&stat=instructions

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D130694
2022-07-29 09:15:03 +01:00
Sunho Kim e590f945c6 Revert "[JITLink][COFF] Implement include/alternatename linker directive."
This reverts commit f1fcd06a2a.

Faliures reported in
https://lab.llvm.org/buildbot/#/builders/193/builds/16143 and http://lab.llvm.org/buildbot/#/builders/91/builds/13010
2022-07-29 17:03:19 +09:00
Sunho Kim f1fcd06a2a [JITLink][COFF] Implement include/alternatename linker directive.
Implements include/alternatename linker directive. Alternatename is used by static msvc runtime library. Alias symbol is technically incorrect (we have to search for external definition) but we don't have a way to represent this in jitlink/orc yet, this is solved in the following up patch.

Inlcude linker directive is used in ucrt to forcelly lookup the static initializer symbols so that they will be emitted. It's implemented as extenral symbols with live flag on that cause the lookup of these symbols.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D130276
2022-07-29 16:48:29 +09:00
Sunho Kim 049fd21b42 [JITLink][COFF][x86_64] Implement ADDR64 relocation.
Implements ADDR64 relocation using x86_64 edge kind.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D130178
2022-07-29 16:32:07 +09:00
Sunho Kim 410e0aa759 [JITLink][COFF] Implement dllimport stubs.
Implements dllimport stubs using GOT table manager. Benefit of using GOT table manager is that we can just reuse jitlink-check architecture.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D130175
2022-07-29 16:29:53 +09:00
Sunho Kim 6b27890b2c [ORC][COFF] Handle COFF import files of static archive.
Handles COFF import files of static archive. Changes static library genrator to build up object file map keyed by symbol name that excludes the symbols from dllimported symbols so that static generator will not be responsible for them. It exposes the list of dynamic libraries that need to be imported. Client should properly load the libraries in this list beforehand. Object file map is also an improvment from the past in terms of performance. Archive.findSym does a slow O(n) linear serach of symbol list to find the symbol. (we call findSym O(n) times, thus full time complexity is O(n^2); we were the only user of findSym function in fact)

There is a room for improvements in how to load the libraries in the list. We currently just hand the responsibility over to the clinet. A better way would be let ORC read this list and hand them over to JITLink side that would also help validation (e.g. not trying to generate stub for non dllimported targets) Nevertheless, we will have to exclude the symbols from COFF import object file list and need a way to access this list, which this patch offers.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D129952
2022-07-29 16:25:08 +09:00
Changpeng Fang 2b731b30a7 AMDGPU: Take care of "tied" operand when removeOperand
Summary:
  Flat scratch load of D16 type by default has tied vdst_in operand (with vdst). This should be taken
care of at the time of "removeOperand" in eliminateFrameIndex. Otherwise we will hit an assert saying
"Cannot move tied operands". This patch unties vdst_in before the move, and retie it with vdst afterwards.

Reviewers:
  arsenm, foad

Differential Revision: https://reviews.llvm.org/D130537
2022-07-28 17:30:49 -07:00
Francis Visoiu Mistrih bfd3883e83 [Matrix] Refactor transpose distribution. NFC
Use a function to distribute transposes. Preparation for future patches.
2022-07-28 17:30:00 -07:00
Felipe de Azevedo Piovezan 58526b2d2b [GlobalISel] Handle nullptr constants in dbg.value
Currently, the LLVM IR -> MIR translator fails to translate dbg.values
whose first argument is a null pointer. However, in other portions of
the code, such pointers are always lowered to the constant zero, for
example see IRTranslator::Translate(Constant, Register).

This patch addresses the limitation by following the same approach of
lowering null pointers to zero.

A prior test was checking that null pointers were always lowered to
$noreg; this test is changed to check for zero, and the previous
behavior is now checked by introducing a dbg.value whose first argument
is the address of a global variable.

Differential Revision: https://reviews.llvm.org/D130721
2022-07-28 14:58:14 -07:00
Felipe de Azevedo Piovezan 0ef6809c48 [GlobalISel][nfc] Remove unnecessary cast
The getOperand method already returns a Constant when it is called on
a ConstantExpression, as such the cast is not needed. To prevent a type
mismatch between the different return statements of the lambda, the
lambda return type is explicitly provided.

Differential Revision: https://reviews.llvm.org/D130719
2022-07-28 14:55:07 -07:00
Anshil Gandhi 5c38056431 [AMDGPU][Scheduler] Avoid initializing Register pressure tracker when tracking is disabled
When register pressure tracking is disabled, the scheduler attempts to load
pressures at SReg_32 and VGPR_32. This causes an index out of bounds error.
This patch fixes this issue by disabling the initialization of RPTracker
when not needed. NFC

Reviewed By: rampitec, kerbowa, arsenm

Differential Revision: https://reviews.llvm.org/D129322
2022-07-28 15:39:28 -06:00
David Blaikie 6139626d73 llvm-dwp: Include dwo name even when the input is a dwo
This still only includes the dwo name if it's in the DW_AT_dwo_name
attribute in the split unit - though it could be improved/modified to
use the dwo name from the command line (if linking raw dwo files) or
retrieved from the DW_AT_dwo_name in the executable (when using -e).

It's useful in any case because you might have a large command line with
many files and knowing exactly which dwo files are relevant will
simplify debugging, but especially with '-e' when you didn't pass the
dwo files explicitly in nthe first place it would be quite non-obvious
where the duplicate units are coming from.
2022-07-28 20:24:05 +00:00
Alexey Lapshin e74197bc12 [Reland][Debuginfo][llvm-dwarfutil] Add check for unsupported debug sections.
Current DWARFLinker implementation does not support some debug sections
(mainly DWARF v5 sections). This patch adds diagnostic for such sections.
The warning would be displayed for critical(such that could not be removed)
sections and the source file would be skipped. Other unsupported sections
would be removed and warning message should be displayed. The zero exit
status would be returned for both cases.

Reviewed By: JDevlieghere

Differential Revision: https://reviews.llvm.org/D123623
2022-07-28 21:29:16 +03:00
Austin Kerbow 0f93a45b11 [AMDGPU] Add isMeta flag to SCHED_GROUP_BARRIER 2022-07-28 11:04:33 -07:00
Fangrui Song c26dc2904b [llvm-objcopy] Support --{,de}compress-debug-sections for zstd
Also, add ELFCOMPRESS_ZSTD (2) from the approved generic-abi proposal:
https://groups.google.com/g/generic-abi/c/satyPkuMisk
("Add new ch_type value: ELFCOMPRESS_ZSTD")

Link: https://discourse.llvm.org/t/rfc-zstandard-as-a-second-compression-method-to-llvm/63399
("[RFC] Zstandard as a second compression method to LLVM")

Differential Revision: https://reviews.llvm.org/D130458
2022-07-28 10:45:53 -07:00
Austin Kerbow f5b21680d1 [AMDGPU] Add amdgcn_sched_group_barrier builtin
This builtin allows the creation of custom scheduling pipelines on a per-region
basis. Like the sched_barrier builtin this is intended to be used either for
testing, in situations where the default scheduler heuristics cannot be
improved, or in critical kernels where users are trying to get performance that
is close to handwritten assembly. Obviously using these builtins will require
extra work from the kernel writer to maintain the desired behavior.

The builtin can be used to create groups of instructions called "scheduling
groups" where ordering between the groups is enforced by the scheduler.
__builtin_amdgcn_sched_group_barrier takes three parameters. The first parameter
is a mask that determines the types of instructions that you would like to
synchronize around and add to a scheduling group. These instructions will be
selected from the bottom up starting from the sched_group_barrier's location
during instruction scheduling. The second parameter is the number of matching
instructions that will be associated with this sched_group_barrier. The third
parameter is an identifier which is used to describe what other
sched_group_barriers should be synchronized with. Note that multiple
sched_group_barriers must be added in order for them to be useful since they
only synchronize with other sched_group_barriers. Only "scheduling groups" with
a matching third parameter will have any enforced ordering between them.

As an example, the code below tries to create a pipeline of 1 VMEM_READ
instruction followed by 1 VALU instruction followed by 5 MFMA instructions...
// 1 VMEM_READ
__builtin_amdgcn_sched_group_barrier(32, 1, 0)
// 1 VALU
__builtin_amdgcn_sched_group_barrier(2, 1, 0)
// 5 MFMA
__builtin_amdgcn_sched_group_barrier(8, 5, 0)
// 1 VMEM_READ
__builtin_amdgcn_sched_group_barrier(32, 1, 0)
// 3 VALU
__builtin_amdgcn_sched_group_barrier(2, 3, 0)
// 2 VMEM_WRITE
__builtin_amdgcn_sched_group_barrier(64, 2, 0)

Reviewed By: jrbyrnes

Differential Revision: https://reviews.llvm.org/D128158
2022-07-28 10:43:14 -07:00
Craig Topper 2750873dfe [RISCV] Update lowerFROUND to use masked instructions.
This avoids a vmerge at the end and avoids spurious fflags updates.
This isn't used for constrained intrinsic so we technically don't have
to worry about fflags, but it doesn't cost much to support it.

To support I've extend our FCOPYSIGN_VL node to support a passthru
operand. Similar to what was done for VRGATHER*_VL nodes.

I plan to do a similar update for trunc, floor, and ceil.

Reviewed By: reames, frasercrmck

Differential Revision: https://reviews.llvm.org/D130659
2022-07-28 10:05:19 -07:00
Craig Topper 89173dee71 [RISCV] Remove duplicate code. NFC
The same operations are part of `FloatingPointVecReduceOps` a little
bit earlier.
2022-07-28 10:05:19 -07:00
Simon Pilgrim 8c99cef1e7 [DAG] Remove SelectionDAG::GetDemandedBits and use SimplifyMultipleUseDemandedBits directly.
GetDemandedBits is mainly a wrapper around SimplifyMultipleUseDemandedBits now, and is only used by DAGCombiner::visitSTORE so I've moved all remaining functionality there.

visitSTORE was making use of this to 'simplify' constants for a trunc-store. Just removing this code left to a mixture of regressions and gains - it came down to whether a target preferred a sign or zero extended constant for materialization/truncation. I've just moved the code over for now, but a next step would be to move this to targetShrinkDemandedConstant, but some targets that override the method expect a basic binop, and might react badly to a store node.....
2022-07-28 17:03:44 +01:00
Philip Reames 82c1b136db [LV] Don't predicate uniform mem op stores unneccessarily
We already had the reasoning about uniform mem op loads; if the address is accessed at least once, we know the instruction doesn't need predicated to ensure fault safety. For stores, we do need to ensure that the values visible in memory are the same with and without predication. The easiest sub-case to check for is that all the values being stored are the same. Since we know that at least one lane is active, this tells us that the value must be visible.

Warning on confusing terminology: "uniform" vs "uniform mem op" mean two different things here, and this patch is specific to the later. It would *not* be legal to make this same change for merely "uniform" operations.

Differential Revision: https://reviews.llvm.org/D130637
2022-07-28 08:55:52 -07:00
Liqiang Tao d52e775b05 [llvm][ModuleInliner] Add inline cost priority for module inliner
This patch introduces the inline cost priority into the
module inliner, which uses the same computation as
InlineCost.

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D130012
2022-07-28 22:44:03 +08:00
Liqiang Tao c113594378 Revert "[llvm][ModuleInliner] Add inline cost priority for module inliner"
This reverts commit bb7f62bbbd.
2022-07-28 22:36:28 +08:00
Florian Hahn f912bab111
Revert "[X86][DAGISel] Don't widen shuffle element with AVX512"
This reverts commit 5fb4134210.

This patch is causing crashes when building llvm-test-suite when
optimizing for CPUs with AVX512.

Reproducer crashing with llc:

    target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
    target triple = "x86_64-apple-macosx"

    define i32 @test(<32 x i32> %0) #0 {
    entry:
      %1 = mul <32 x i32> %0, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
      %2 = tail call i32 @llvm.vector.reduce.add.v32i32(<32 x i32> %1)
      ret i32 %2
    }

    ; Function Attrs: nocallback nofree nosync nounwind readnone willreturn
    declare i32 @llvm.vector.reduce.add.v32i32(<32 x i32>) #1

    attributes #0 = { "min-legal-vector-width"="0" "target-cpu"="skylake-avx512" }
    attributes #1 = { nocallback nofree nosync nounwind readnone willreturn }
2022-07-28 15:26:42 +01:00
Simon Pilgrim be488ba7de [DAG] DAGCombiner::visitTRUNCATE - remove GetDemandedBits call
This should now all be handled by SimplifyDemandedBits.
2022-07-28 15:23:04 +01:00
Simon Pilgrim ea7f14dad0 [DAG] SelectionDAG::GetDemandedBits - don't simplify opaque constants
I'm actually trying to get rid of GetDemandedBits - but while dismantling it I noticed that we were altering opaque constants. Fixing that causes a FP_TO_INT_SAT regression that should be addressed separately - I'll raise a bug.
2022-07-28 14:46:59 +01:00
Liqiang Tao bb7f62bbbd [llvm][ModuleInliner] Add inline cost priority for module inliner
This patch introduces the inline cost priority into the
module inliner, which uses the same computation as
InlineCost.

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D130012
2022-07-28 21:28:07 +08:00
Simon Pilgrim 69d5a038b9 [DAG] Enable ISD::SRL SimplifyMultipleUseDemandedBits handling inside SimplifyDemandedBits
This patch allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits in cases where the ISD::SRL source operand has other uses, enabling us to peek through the shifted value if we don't demand all the bits/elts.

This is another step towards removing SelectionDAG::GetDemandedBits and just using TargetLowering::SimplifyMultipleUseDemandedBits.

There a few cases where we end up with extra register moves which I think we can accept in exchange for the increased ILP.

Differential Revision: https://reviews.llvm.org/D77804
2022-07-28 14:10:44 +01:00
Amaury Séchet 474a8ee03d [DAG] Use recursivelyDeleteUnusedNodes in PromoteLoad
It simplifies the code overall and removes the need for manual bookkeeping.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D130447
2022-07-28 12:54:52 +00:00
Amaury Séchet 7920805b27 [DAG] Use recursivelyDeleteUnusedNodes in ReplaceLoadWithPromotedLoad
It simplifies the code overall and removes the need for manual bookkeeping.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D130444
2022-07-28 12:32:37 +00:00
Alexander Timofeev 76d9ae924c [AMDGPU] avoid blind converting to VALU REG_SEQUENCE and PHIs
In the 2e29b0138c we introduce a specific solving algorithm
that analyzes the VGPR to SGPR copies use chains and either lowers
the copy to v_readfirstlane_b32 or converts the whole chain to VALU forms.
Same time we still have the code that blindly converts to VALU REG_SEQUENCE and PHIs
in case they produce SGPR but have VGPRs input operands. In case the REG_SEQUENCE and PHIs
are in the VGPR to SGPR copy use chain, and this chain was considered long enough to convert
copy to v_readfistlane_b32, further lowering them to VALU leads to several kinds of issues.
At first, we have v_readfistlane_b32 which is completely useless because most parts of its use chain
were moved to VALU forms. Second, we may encounter subtle bugs related to the EXEC-dependent CF
because of the weird mixing of SALU and VALU instructions.
This change removes the code that moves REG_SEQUENCE and PHIs to VALU. Instead, we use the fact
that both REG_SEQUENCE and PHIs have copy semantics. That is, if they define SGPR but have VGPR inputs,
we insert VGPR to SGPR copies to make them pure SGPR. Then, the new copies are processed by the common
VGPR to SGPR lowering algorithm.
This is Part 2 in the series of commits aiming at the massive refactoring of the SIFixSGPRCopies pass.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D130367
2022-07-28 14:30:29 +02:00
Sunho Kim 72ea1a721e [ORC] Fix weak hidden symbols failure on PPC with runtimedyld
Fix "JIT session error: Symbols not found: [ DW.ref.__gxx_personality_v0 ] error" which happens when trying to use exceptions on ppc linux. To do this, it expands AutoClaimSymbols option in RTDyldObjectLinkingLayer to also claim weak symbols before they are tried to be resovled. In ppc linux, DW.ref symbols is emitted as weak hidden symbols in the later stage of MC pipeline. This means when using IRLayer (i.e. LLJIT), IRLayer will not claim responsibility for such symbols and RuntimeDyld will skip defining this symbol even though it couldn't resolve corresponding external symbol.

Reviewed By: sgraenitz

Differential Revision: https://reviews.llvm.org/D129175
2022-07-28 21:12:25 +09:00
Dmitry Preobrazhensky 2b230d69ad [AMDGPU][MC][GFX90A] Correct MIMG dst size validation
Correct validator to enable MIMG dst size checks.

Differential Revision: https://reviews.llvm.org/D130512
2022-07-28 14:30:08 +03:00
Sanjay Patel 28ad5dc3f7 [InstCombine] try harder to narrow bitwise logic with cast operands
This works with any logic + extend:
https://alive2.llvm.org/ce/z/vzsqQD

The motivating case is from issue #56294, but that's still not optimal
(it should simplify completely).
2022-07-28 07:23:22 -04:00
Dmitry Preobrazhensky fa7fd8ec31 [AMDGPU][MC][GFX11] Disable SGPRs for src1 of v_fma_mix*_dpp opcodes
Differential Revision: https://reviews.llvm.org/D130634
2022-07-28 14:20:05 +03:00
chendewen 7eeb468ae5 [Aarch64] Add cost for missing extensions.
This patch adds a cost estimate for some missing sign extensions.
ref: https://reviews.llvm.org/D14730

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D130565
2022-07-28 17:34:00 +08:00
Florian Hahn 8daa338297
[SCEV] Avoid repeated proveNoUnsignedWrapViaInduction calls.
At the moment, proveNoUnsignedWrapViaInduction may be called for the
same AddRec a large number of times via getZeroExtendExpr. This can have
a severe compile-time impact for very loop-heavy code. One one
particular workload, LSR takes ~51s without this patch, almost
exlusively in proveNoUnsignedWrapViaInduction. With this patch, the time
in LSR drops to ~0.4s.

If proveNoUnsignedWrapViaInduction failed to prove NUW the first time,
it is unlikely to succeed on subsequent tries and the cost doesn't seem
to be justified.

Besides drastically improving compile-time in some excessive cases, this
also has a slightly positive compile-time impact on CTMark:

NewPM-O3: -0.07%
NewPM-ReleaseThinLTO: -0.08%
NewPM-ReleaseLTO-g: -0.06

https://llvm-compile-time-tracker.com/compare.php?from=b435da027d7774c24cdb8c88d09f6b771e07fb14&to=f2729e33e8284b502f6c35a43345272252f35d12&stat=instructions

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D130648
2022-07-28 10:02:19 +01:00
David Spickett a0ccba5e19 [llvm] Fix some test failures with EXPENSIVE_CHECKS and libstdc++
DebugLocEntry assumes that it either contains 1 item that has no fragment
or many items that all have fragments (see the assert in addValues).

When EXPENSIVE_CHECKS is enabled, _GLIBCXX_DEBUG is defined. On a few machines
I've checked, this causes std::sort to call the comparator even
if there is only 1 item to sort. Perhaps to check that it is implemented
properly ordering wise, I didn't find out exactly why.

operator< for a DbgValueLoc will crash if this happens because the
optional Fragment is empty.

Compiler/linker/optimisation level seems to make this happen
or not. So I've seen this happen on x86 Ubuntu but the buildbot
for release EXPENSIVE_CHECKS did not have this issue.

Add an explicit check whether we have 1 item.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D130156
2022-07-28 08:53:38 +00:00
Max Kazantsev 2d1c6e0b44 [LAA] Remove block order sensitivity in LAA algorithm. PR56672
As test in PR56672 shows, LAA produces different results which lead to either
positive or negative vectorization decisions depending on the order of blocks
in loop. The exact reason of this is not clear to me, however this makes investigation
of related bugs extremely complex.

Current order of blocks in the loop is arbitrary. It may change, for example, if loop
info analysis is dropped and recomputed. Seems that it interferes with LAA's logic.
This patch chooses fixed traversal order of blocks in loops, making it RPOT.

Note: this is *not* a fix for bug with incorrect analysis result. It just makes
the answer more robust to make the investigation easier.

Differential Revision: https://reviews.llvm.org/D130482
Reviewed By: aeubanks, fhahn
2022-07-28 13:36:56 +07:00
Phoebe Wang 726d9f8e8c [X86][MC] Avoid emitting incorrect warning for complex FMUL
We will insert a new operand which is identical to the Dest for complex
FMUL with a mask. https://godbolt.org/z/eTEdnYv3q

Complex FMA and FMUL with maskz don't have this problem.

Reviewed By: LuoYuanke, skan

Differential Revision: https://reviews.llvm.org/D130638
2022-07-28 13:58:34 +08:00
Austin Kerbow ba0d079c7a [AMDGPU] Aggressively schedule to reduce RP in occupancy limited regions
By not clustering loads and adjusting heuristics to more aggressively reduce
register pressure we may be able to increase occupancy for the function if it
was dropped in a first pass scheduling.

Similarly, try to reduce spilling if register usage exceeds lower bound
occupancy.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D130329
2022-07-27 22:34:37 -07:00
Amara Emerson 93e3aeb9a8 [AArch64][GlobalISel] Fix custom legalization of rotates using sext for shift vs zext.
Rotates are defined according to DAG documentation as having unsigned shifts,
so we need to zero-extend instead of sign-extend here.

Fixes issue 56664
2022-07-27 22:10:42 -07:00
Carl Ritson dbda30e294 [AMDGPU][SIFoldOperands] Clear kills when folding COPY
Clear all kill flags on source register when folding a COPY.
This is necessary because the kills may now be out of order with the uses.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D130622
2022-07-28 11:57:55 +09:00
Craig Topper a304d70ee9 [RISCV] Reorder (and/or/xor (shl X, C1), C2) if we can form ANDI/ORI/XORI.
InstCombine and DAGCombine prefer to keep shl before binops.

This patch teaches isel to convert to (shl (and/or/xor X, C1 >> C2), C2)
if (C1 >> C2) is a simm12. The idea was taken from X86's isel code.

There's a special case implemented for a sext_inreg between the
shift and the binop.

Differential Revision: https://reviews.llvm.org/D130610
2022-07-27 17:35:26 -07:00
Craig Topper 1d1d8d6025 [RISCV] Reorder code in lowerFROUND to make the diff in D130659 cleaner. NFC 2022-07-27 17:13:04 -07:00
Matt Arsenault bfdca1535c RegAllocGreedy: Fix nondeterminism in tryLastChanceRecoloring
tryLastChanceRecoloring iterates over the set of LiveInterval pointers
and used that to seed the recoloring stack, which was
nondeterministic. Fixes a future test failing about 20% of the time.

This just takes the order the interfering vreg was encountered. Not
sure if we should try to order this more intelligently.
2022-07-27 19:02:06 -04:00
Craig Topper 98647330bf [RISCV] Add merge operand to RISCVISD::FCOPYSIGN_VL.
Similar to what was done for VRGATHER*_VL recently.

This will be used in D130659.
2022-07-27 15:25:34 -07:00
Paul Kirth 6e9bab71b6 Revert "[llvm][NFC] Refactor code to use ProfDataUtils"
This reverts commit 300c9a7881.

We will reland once these issues are ironed out.
2022-07-27 21:38:11 +00:00
Paul Kirth 300c9a7881 [llvm][NFC] Refactor code to use ProfDataUtils
In this patch we replace common code patterns with the use of utility
functions for dealing with profiling metadata. There should be no change
in functionality, as the existing checks should be preserved in all
cases.

Reviewed By: bogner, davidxl

Differential Revision: https://reviews.llvm.org/D128860
2022-07-27 21:13:54 +00:00
Paul Kirth 6047deb7c2 [llvm] Provide utility function for MD_prof
Currently, there is significant code duplication for dealing with
MD_prof metadata throughout the compiler. These utility functions can
improve code reuse and simplify boilerplate code when dealing with
profiling metadata, such as branch weights. The inent is to provide a
uniform set of APIs that allow common tasks, such as identifying
specific types of MD_prof metadata and extracting branch weights.

Future patches can build on this initial implementation and clean up the
different implementations across the compiler.

Reviewed By: bogner

Differential Revision: https://reviews.llvm.org/D128858
2022-07-27 21:13:51 +00:00
Adrian Prantl 719ab04acf [GlobalISel] Handle IntToPtr constants in dbg.value
Currently, the IR to MIR translator can only handle two kinds of constant
inputs to dbg.values intrinsics: constant integers and constant floats. In
particular, it cannot handle pointers created from IntToPtr ConstantExpression
objects.

This patch addresses the limitation above by replacing the IntToPtr with
its input integer prior to converting the dbg.value input.

Patch by Felipe Piovezan!

Differential Revision: https://reviews.llvm.org/D130642
2022-07-27 13:42:07 -07:00
Philip Reames 15c645f7ee [RISCV] Enable (scalable) vectorization by default
This change enables vectorization (using scalable vectorization only, fixed vectors are not yet enabled) for RISCV when vector instructions are available for the target configuration.

At this point, the resulting configuration should be both stable (e.g. no crashes), and profitable (i.e. few cases where scalar loops beat vector ones), but is not going to be particularly well tuned (i.e. we emit the best possible vector loop). The goal of this change is to align testing across organizations and ensure the default configuration matches what downstreams are using as closely as possible.

This exposes a large amount of code which hasn't otherwise been on by default, and thus may not have been fully exercised.  Given that, having issues fall out is not unexpected.  If you find issues, please make sure to include as much information as you can when reverting this change.

Differential Revision: https://reviews.llvm.org/D129013
2022-07-27 12:36:04 -07:00
Stanislav Mekhanoshin 68901fdbeb [AMDGPU] Consider S_SETPRIO a scheduling boundary
The instruction is used to modify wave priority with the intent
to affect VALU execution and currently we can reschedule VALU
around it since that VALU does not have side effects.

Differential Revision: https://reviews.llvm.org/D130654
2022-07-27 11:50:23 -07:00
Jonas Devlieghere a8c3d9815e
[DebugInfo] Teach LLVM and LLDB about ptrauth in DWARF
Teach libDebugInfo (llvm-dwarfdump) and lldb about DWARF tags and
attributes for pointer authentication. These values have been emitted by
Apple clang for several releases. Although upstream LLVM doesn't emit
these values yet, we hope to upstream that part sometime soon.

Differential revision: https://reviews.llvm.org/D130215
2022-07-27 11:48:35 -07:00
Amara Emerson 65246d3eb4 Use hasNItemsOrLess() in MRI::hasAtMostUserInstrs(). 2022-07-27 11:42:14 -07:00
Florian Hahn 16e0620d6d
[VPlan] Mark VPPredInstPHIRecipe as not having side-effects.
Now that all uses of VPPredInstPHIRecipes are properly modeled, they can
be treated as not having side-effects, enabling removal.
2022-07-27 19:29:26 +01:00
Mingming Liu 34348814e1 [AArch64] Explicitly use v1i64 type for llvm.aarch64.neon.pmull64
Without this, the intrinsic will be expanded to an integer; thereby an
explicit copy (from GPR to SIMD register) will be codegen'd. This matches the
general convention of using "v1" types to represent scalar integer operations in
vector registers.

The similar approach is observed in D56616, and the pattern likely applies on
other intrinsic that accepts integer scalars (e.g.,
int_aarch64_neon_sqdmulls_scalar)

Differential Revision: https://reviews.llvm.org/D130548
2022-07-27 11:11:16 -07:00
Amara Emerson 19cdd1908b [AArch64][GlobalISel] Add heuristics for localizing G_CONSTANT.
This adds similar heuristics to G_GLOBAL_VALUE, querying the cost of
materializing a specific constant in code size. Doing so prevents us from
sinking constants which require multiple instructions to generate into
use blocks.

Code size savings on CTMark -Os:
Program                                       size.__text
                                              before         after           diff
ClamAV/clamscan                               381940.00      382052.00       0.0%
lencod/lencod                                 428408.00      428428.00       0.0%
SPASS/SPASS                                   411868.00      411876.00       0.0%
kimwitu++/kc                                  449944.00      449944.00       0.0%
Bullet/bullet                                 463588.00      463556.00      -0.0%
sqlite3/sqlite3                               284696.00      284668.00      -0.0%
consumer-typeset/consumer-typeset             414492.00      414424.00      -0.0%
7zip/7zip-benchmark                           595244.00      594972.00      -0.0%
mafft/pairlocalalign                          247512.00      247368.00      -0.1%
tramp3d-v4/tramp3d-v4                         372884.00      372044.00      -0.2%
                           Geomean difference                               -0.0%

Differential Revision: https://reviews.llvm.org/D130554
2022-07-27 10:51:16 -07:00
Stanislav Mekhanoshin 0562cf442f Allow data prefetch into non-default address space
I am playing with the LoopDataPrefetch pass and found out that it
bails to work with a pointer in a non-zero address space. This
patch adds the target callback to check if an address space is to
be considered for prefetching. Default implementation still only
allows address space 0, so this is NFCI.

This does not currently affect any known targets, but seems to be
generally useful for the future.

Differential Revision: https://reviews.llvm.org/D129795
2022-07-27 10:01:26 -07:00
Eli Friedman 1a6d82b93f Fix misc uses of "long" variables to use "int64_t".
I don't have any evidence these particular uses are actually causing any
issues, but we should avoid accidentally truncating immediate values
depending on the host.
2022-07-27 09:47:19 -07:00
Craig Topper 32622d6de4 [RISCV] Add isel pattern for (mul (and X, 0xffffffff), 3<<C) with Zba.
We can use slli.uw by C followed by sh1add. Similar can be done
for multiples of 5 and 9. We need to make sure that C is less than
32 to stay in bounds of the 5-bit immediate for slli.uw.

We have existing patterns for (mul X, 3<<C) that use sh1add
followed by slli. That order doesn't allow the and to be folded.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D130146
2022-07-27 09:41:59 -07:00
Craig Topper 9b27d13204 [RISCV] Disable constant hoisting for multiply by negated power of 2.
A mul by a negated power of 2 is a slli followed by neg. This doesn't
require any constant materialization and may be lower latency than mul.
The neg may also be foldable into other arithmetic.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D130047
2022-07-27 09:37:59 -07:00
Umesh Kalappa f38ea84a9f [PowerPC] Change long to int64_t (which is always 64 bit or 8 bytes )
We can't guarantee the long always 64 bits like WINDOWS or LLP64 data
model (rare but we should consider).

So use int64_t from inttypes.h and safe in this case.

Fixes https://github.com/llvm/llvm-project/issues/55911 .
2022-07-27 09:34:45 -07:00
Sanjay Patel e079bf6558 [AggressiveInstCombine] check sqrt operand to allow more libcall->intrinsic transforms
This should fix issue #56383 (at least when compiled with -O3 because this pass is only
run at -O3 currently).
2022-07-27 11:36:13 -04:00
Dmitri Gribenko b435da027d [amdgpu][nfc] Fix build with a certan Clang version
It errors out in the Bazel CI:

AMDGPULowerModuleLDSPass.cpp:384:12: error: chosen constructor is
explicit in copy-initialization
    return {SGV, std::move(Map)};

Reviewed By: rupprecht

Differential Revision: https://reviews.llvm.org/D130623
2022-07-27 17:29:36 +02:00
Joseph Huber b08369f7f2 Revert "[OpenMP] Remove noinline attributes in the device runtime"
The behaviour of this patch is not great, but it has some side-effects
that are required for OpenMPOpt to work. The problem is that when we use
`-mlink-builtin-bitcode` we only import used symbols from the runtime.
Then OpenMPOpt will insert calls to symbols that were not previously
included. This patch removed this implicit behaviour as these functions
were kept alive by the `noinline` simply because it kept calls to them
in the module. This caused regression in some tests that relied on some
OpenMPOpt passes without using LTO. Reverting for the LLVM15 release but
will try to fix it more correctly on main.

This reverts commit d61d72dae6.

Fixes #56752
2022-07-27 11:09:18 -04:00
Quinn Pham b6cc5ddc94 [libLTO] Set data-sections by default in libLTO.
This patch changes legacy LTO to set data-sections by default. The user can
explicitly unset data-sections. The reason for this patch is to match the
behaviour of lld and gold plugin. Both lld and gold plugin have data-sections on
by default.

This patch also fixes the forwarding of the clang options -fno-data-sections and
-fno-function-sections to libLTO. Now, when -fno-data/function-sections are
specified in clang, -data/function-sections=0 will be passed to libLTO to
explicitly unset data/function-sections.

Reviewed By: w2yehia, MaskRay

Differential Revision: https://reviews.llvm.org/D129401
2022-07-27 09:39:39 -05:00
Quinn Pham 70ec8cd024 Revert "[libLTO] Set data-sections by default in libLTO."
This reverts commit f565444b48.
2022-07-27 08:47:00 -05:00
Quinn Pham f565444b48 [libLTO] Set data-sections by default in libLTO.
This patch changes legacy LTO to set data-sections by default. The user can
explicitly unset data-sections. The reason for this patch is to match the
behaviour of lld and gold plugin. Both lld and gold plugin have data-sections on
by default.

This patch also fixes the forwarding of the clang options -fno-data-sections and
-fno-function-sections to libLTO. Now, when -fno-data/function-sections are
specified in clang, -data/function-sections=0 will be passed to libLTO to
explicitly unset data/function-sections.

Reviewed By: w2yehia, MaskRay

Differential Revision: https://reviews.llvm.org/D129401
2022-07-27 08:34:40 -05:00
Simon Pilgrim c0b3f7a50f [DAG] SimplifyDemandedBits - ensure we clear known One bits that AssertZext asserts are really known Zero
Matches ComputeKnownBits behaviour

Thanks to @uabelho for the fuzz regression report on D129765
2022-07-27 13:57:47 +01:00
Aaron Kogon dd3ca65c37 Sinking or hoisting instructions between loops before fusion
Instructions between two adjacent loops will be hoisted above the first
loop, or sunk below the second to facilitate loop fusion. Hoisting will
be attempted for an instruction that dominates the first loop.
Otherwise, sinking this instructions will be attempted.

Instructions with side effects will not be considered for sinking or
hoisting. Hoisting/sinking of any instructions between loops will only
be performed if all the instructions can be moved. As well,
sinking/hoisting is considered for each instruction in isolation,
without taking into account sinking/hoisting decisions for other
instructions in the preheader.

Differential Revision: https://reviews.llvm.org/D118076
2022-07-27 06:55:09 -04:00
LiaoChunyu bf4f9a468a [RISCV]Enable isIntDivCheap when attribute is minsize
Don't expand divisions by constants when attribute is minsize.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D130543
2022-07-27 18:22:51 +08:00
Rainer Orth 979ddfff37 [Support] Handle SPARC in sys::getHostCPUName
While working on D118450 <https://reviews.llvm.org/D118450>, I noticed that
`sys::getHostCPUName` lacks SPARC support.

This patch implements it.  The code is taken from/inspired by GCC's
`gcc/config/sparc/driver-sparc.cc`.  There's one caveat: since LLVM, unlike
GCC, doesn't support the SPARC-M7, -S7, and -M8 CPUs, I map all those to
the latest supported one (UltraSparc T4/`niagara4`).

Tested on `sparcv9-sun-solaris2.11` and `sparc64-unknown-linux-gnu` by
running `savcov --version` on

- Netra SPARC S7-2 (SPARC-S7, Solaris 11.4)
- SPARC T5-2 (SPARC T5, Solaris 11.4)
- SPARC Enterprise T5220 (UltraSPARC T2, Solaris 11.3)
- SPARC T5 (UltraSPARC T5, Debian sid)
- SPARC T3 (UltraSPARC T3, Debian sid)
- SPARC Enterprise T5220 (Debian sid)

Differential Revision: https://reviews.llvm.org/D130272
2022-07-27 12:21:03 +02:00
Simon Pilgrim 529bd4f352 [DAG] SimplifyDemandedBits - don't early-out for multiple use values
SimplifyDemandedBits currently early-outs for multi-use values beyond the root node (just returning the knownbits), which is missing a number of optimizations as there are plenty of cases where we can still simplify when initially demanding all elements/bits.

@lenary has confirmed that the test cases in aea-erratum-fix.ll need refactoring and the current increase codegen is not a major concern.

Differential Revision: https://reviews.llvm.org/D129765
2022-07-27 10:54:06 +01:00
Zi Xuan Wu (Zeson) 70b8b738c5 [CSKY] Fix the btsti16 instruction missing in generic processor
Normally, generic processor does not have any SubtargetFeature. And it
can just generate most basic instructions which have no Predicates to
guard.

But it needs to enbale predicate for the btsti16 instruction as one of the most basic instructions.
Or the generic processor can't finish codegen process. So Add FeatureBTST16 SubtargetFeature to generic ProcessorModel.
2022-07-27 17:39:15 +08:00
Alexey Lapshin 79ff02a122 Revert "[Debuginfo][llvm-dwarfutil] Add check for unsupported debug sections."
This reverts commit 0d191b7553.
2022-07-27 11:48:56 +03:00
Ying Yi 8d46cb343f Emit a simple StackSizesSection on PS4.
Differential Revision: https://reviews.llvm.org/D130495
2022-07-27 09:39:24 +01:00
David Green 39f8384964 [ARM] Correct features on pacbti instructions.
Given a patch like D129506, using instructions not valid for the current
feature set becomes an error. This updates the Arm hint-space
instructions for pac/bti to require thumbv7m as opposed to 8.1-m.main, to
make them valid when compiling for thumbv7m with -mbranch-protection.

Differential Revision: https://reviews.llvm.org/D129692
2022-07-27 09:15:14 +01:00
Nikita Popov b1b1086973 [ARM] Add target feature to force 32-bit atomics
This adds a +atomic-32 target feature, which instructs LLVM to assume
that lock-free 32-bit atomics are available for this target, even
if they usually wouldn't be.

If only atomic loads/stores are used, then this won't emit libcalls.
If atomic CAS is used, then the user is responsible for providing
any necessary __sync implementations (e.g. by masking interrupts
for single-core privileged use cases).

See https://reviews.llvm.org/D120026#3674333 for context on this
change. The tl;dr is that the thumbv6m target in Rust has
historically made atomic load/store only available, which is
incompatible with the change from D120026, which switched these to
use libatomic.

Differential Revision: https://reviews.llvm.org/D130480
2022-07-27 10:00:31 +02:00
Amara Emerson 9cc1dd209d [AArch64][GlobalISel] Lower vector G_CTTZ.
Fixes issue 56398
2022-07-27 00:14:30 -07:00
Kirill Stoimenov d6e1e0a019 [ASan] Use stack safety analysis to optimize allocas instrumentation.
Added alloca optimization which was missed during the implemenation of D112098.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D130503
2022-07-26 18:48:16 -07:00
Jon Chesterfield 3ccd88f209 [amdgpu][nfc] Separate processUsedLDS into independent pieces, rename it 2022-07-27 01:55:43 +01:00
Jon Chesterfield 9981afdd42 [amdgpu][nfc] Extract kernel annotation from processUsedLDS 2022-07-27 01:38:41 +01:00
Tom Stellard fd84d97ba6 Revert "[Support] Workaround compiler bug in MSVC"
This reverts commit ec8f4fd68c.

This caused a failure in the mlir-windows bot.
2022-07-26 15:49:35 -07:00
Dmitry Vassiliev e3e63f30a5 [CodeGen] Fixed ambiguous symbol ExtAddrMode in case of NDEBUG and LLVM_ENABLE_DUMP
This patch fixes the following error with MSVC 16.9.2 in case of NDEBUG and LLVM_ENABLE_DUMP:
llvm/lib/CodeGen/CodeGenPrepare.cpp(2581): error C2872: 'ExtAddrMode': ambiguous symbol
llvm/include/llvm/CodeGen/TargetInstrInfo.h(86): note: could be 'llvm::ExtAddrMode'
llvm/lib/CodeGen/CodeGenPrepare.cpp(2447): note: or '`anonymous-namespace'::ExtAddrMode'
llvm/lib/CodeGen/CodeGenPrepare.cpp(2581): error C2039: 'print': is not a member of 'llvm::ExtAddrMode'

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D130426
2022-07-27 00:21:57 +02:00
Martin Sebor 4447603616 [InstCombine] Fold strtoul and strtoull and avoid PR #56293
Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D129224
2022-07-26 14:11:40 -06:00
Jon Chesterfield 923b90bddb [amdgpu][nfc] Separate LDS struct creation from RAUW 2022-07-26 20:59:17 +01:00
Tom Stellard ec8f4fd68c [Support] Workaround compiler bug in MSVC
https://developercommunity.visualstudio.com/t/Prev-Issue---with-__assume-isnan-/1597317

This was causing unittest failures on Windows for the GitHub actions
based CI we use in the release branches.

Failed Tests (2):
  LLVM-Unit :: Support/./SupportTests.exe/FormatVariadicTest.BigTest
  LLVM-Unit :: Support/./SupportTests.exe/NativeFormatTest.BoundaryTests

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D129822
2022-07-26 12:55:10 -07:00
Sanjay Patel e3205b8765 [AggressiveInstCombine] convert sqrt libcalls with "nnan" to sqrt intrinsics
This is an alternate to D129155 that uses TTI.haveFastSqrt() to avoid a
potential miscompile for programs with reads of errno. Moving the transform
to AggressiveInstCombine provides access to TTI.

If a sqrt call has "nnan", that implies that the input argument is never
negative because sqrt of {negative number} --> NAN.
If the argument is never negative and the call can be lowered without a
libcall, then we can assume that errno accesses are unchanged after lowering,
so the call can be translated to the LLVM intrinsic (which is expected to
become inline code).

This affects codegen for targets like x86 that have sqrt instructions, but
still have to conservatively assume that a libcall may be needed to set
errno as shown in issue #52620 and issue #56383.

This patch won't solve those examples - we will need to extend this to use
CannotBeOrderedLessThanZero or similar, enhance that analysis for new
operators, and/or deal with llvm.assume too.

Differential Revision: https://reviews.llvm.org/D129167
2022-07-26 15:50:14 -04:00
Sanjay Patel dcd09467b0 [InstSimplify] remove redundant calls to 'isImplied'; NFCI
We already call the more general isImpliedCondition() (which calls
isImpliedTrueByMatchingCmp() internally) from simplifyAndInst()
and simplifyOrInst().

There was a difference visible with this change on a vector test
before a925bef70c, but I can't find any gaps now.
2022-07-26 14:47:21 -04:00
Francis Visoiu Mistrih 448a094d3e [Matrix] Add assert to catch extracted vectors with poison elements
Assert when the extracted vector is wider than the row/column.

Differential Revision: https://reviews.llvm.org/D130173
2022-07-26 11:07:02 -07:00
Craig Topper 3a2d7d8ad5 [RISCV] Add Predicate to c.lw/c.sw/c.lwsp/c.swsp InstAliases with no offset.
These are aliases that allow the immediate offset to be ommitted.
We had predicates for the RV64, RV32+F, and D versions, but
not the base versions.

I've also re-ordered them to share Predicate lines to improve
readability.
2022-07-26 11:06:00 -07:00
Francis Visoiu Mistrih 2c6e8b4636 [Matrix] Refactor tiled loops in a struct. NFC
The three loops have the same structure: index, header, latch.
2022-07-26 11:02:22 -07:00
Jon Chesterfield 26dcc7e64a [amdgpu][nfc] Skip operations on padding fields in LDS struct 2022-07-26 18:31:02 +01:00
Fangrui Song f106525de2 [MachineFunctionPass] Support -print-changed and -print-changed=quiet
-print-changed for new pass manager is handy beside -print-after-all.
Port it to MachineFunctionPass.

Note: lib/Passes/StandardInstrumentations.cpp implements a number of
misc features. If we want to use them for codegen, we may need to lift
some functionality to LLVMIR.

Reviewed By: aeubanks, jamieschmeiser

Differential Revision: https://reviews.llvm.org/D130434
2022-07-26 10:16:49 -07:00
Simon Pilgrim 1ea7b9c6ee [DAG] matchRotateSub - set demanded bits to the shift amount type size, not the shift result size.
This should fix a report on D130251 of an assert due to a bitwidth mismatch in APInt::isSubSetOf
2022-07-26 17:58:51 +01:00
Austin Kerbow 7ca9e471fe [AMDGPU] Start refactoring GCNSchedStrategy
Tries to make the different scheduling stages a bit more self contained and
modifiable. Intended to be NFC. Preface to other changes.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D130147
2022-07-26 08:55:19 -07:00
Stefan Gränitz 1e30820483 [WinEH] Apply funclet operand bundles to nounwind intrinsics that lower to function calls in the course of IR transforms
WinEHPrepare marks any function call from EH funclets as unreachable, if it's not a nounwind intrinsic or has no proper funclet bundle operand. This
affects ARC intrinsics on Windows, because they are lowered to regular function calls in the PreISelIntrinsicLowering pass. It caused silent binary truncations and crashes during unwinding with the GNUstep ObjC runtime: https://github.com/gnustep/libobjc2/issues/222

This patch adds a new function `llvm::IntrinsicInst::mayLowerToFunctionCall()` that aims to collect all affected intrinsic IDs.
* Clang CodeGen uses it to determine whether or not it must emit a funclet bundle operand.
* PreISelIntrinsicLowering asserts that the function returns true for all ObjC runtime calls it lowers.
* LLVM uses it to determine whether or not a funclet bundle operand must be propagated to inlined call sites.

Reviewed By: theraven

Differential Revision: https://reviews.llvm.org/D128190
2022-07-26 17:52:43 +02:00
Paul Walker e5c892dd85 [SVE][SelectionDAG] Use INDEX to generate matching instances of BUILD_VECTOR.
This patch starts small, only detecting sequences of the form
<a, a+n, a+2n, a+3n, ...> where a and n are ConstantSDNodes.

Differential Revision: https://reviews.llvm.org/D125194
2022-07-26 15:28:37 +00:00
Alexander Yermolovich f8df811471 [DWP][DWARF] Detect and error on debug info offset overflow
Right now we silently overflow uint32_t for debug_indfo sections. Added a check
and error out.

Differential Revision: https://reviews.llvm.org/D130395
2022-07-26 08:18:59 -07:00
Arthur Eubanks 2eade1dba4 [WPD] Use new llvm.public.type.test intrinsic for potentially publicly visible classes
Turning on opaque pointers has uncovered an issue with WPD where we currently pattern match away `assume(type.test)` in WPD so that a later LTT doesn't resolve the type test to undef and introduce an `assume(false)`. The pattern matching can fail in cases where we transform two `assume(type.test)`s into `assume(phi(type.test.1, type.test.2))`.

Currently we create `assume(type.test)` for all virtual calls that might be devirtualized. This is to support `-Wl,--lto-whole-program-visibility`.

To prevent this, all virtual calls that may not be in the same LTO module instead use a new `llvm.public.type.test` intrinsic in place of the `llvm.type.test`. Then when we know if `-Wl,--lto-whole-program-visibility` is passed or not, we can either replace all `llvm.public.type.test` with `llvm.type.test`, or replace all `llvm.public.type.test` with `true`. This prevents WPD from trying to pattern match away `assume(type.test)` for public virtual calls when failing the pattern matching will result in miscompiles.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D128955
2022-07-26 08:01:08 -07:00
Dmitry Preobrazhensky e43621b09c [AMDGPU][MC][GFX11] Correct src0 for VOP3_DPP variants of v_cmp*class* opcodes
Disable SGPRs for src0 of these opcodes.

Differential Revision: https://reviews.llvm.org/D130486
2022-07-26 17:52:34 +03:00
Dmitry Preobrazhensky 0eb9f18520 [AMDGPU][MC][GFX11] Correct encoding of VOP3/VOP3_DPP v_cmpx* opcodes
Encode dst=EXEC but allow disassembler accept any dst value.

Differential Revision: https://reviews.llvm.org/D130345
2022-07-26 17:36:22 +03:00
Sander de Smalen a41ddf178e [AArch64][SVE] Sink ptrue into loop if it is used by PTEST.
This helps fold away the ptest instructions, which needs the knowledge on whether
the general predicate is known to zero the inactive lanes.

This fixes some PTEST regressions introduced by D129282.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D129852
2022-07-26 15:07:41 +01:00
Sander de Smalen 370ff43a15 [AArch64][SVE] Consider more intrinsics in 'isZeroingInactiveLanes'.
This fixes some PTEST regressions introduced by D129282.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D129851
2022-07-26 15:07:41 +01:00
wangpc 1a7078d106 [DAGCombine] Mask doesn't have to be (EltSize - 1) exactly when combining rotation
I think what we need is the least Log2(EltSize) significant bits are known to be ones.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D130251
2022-07-26 21:14:45 +08:00
Alexey Lapshin 0d191b7553 [Debuginfo][llvm-dwarfutil] Add check for unsupported debug sections.
Current DWARFLinker implementation does not support some debug sections
(mainly DWARF v5 sections). This patch adds diagnostic for such sections.
The warning would be displayed for critical(such that could not be removed)
sections and the source file would be skipped. Other unsupported sections
would be removed and warning message should be displayed. The zero exit
status would be returned for both cases.

Reviewed By: JDevlieghere

Differential Revision: https://reviews.llvm.org/D123623
2022-07-26 15:25:51 +03:00
Sven van Haastregt c8d91b07bb Reassoc FMF should not optimize FMA(a, 0, b) to (b)
Optimizing (a * 0 + b) to (b) requires assuming that a is finite and not
NaN. DAGCombiner will do this optimization when the reassoc fast math
flag is set, which is not correct. Change DAGCombiner to only consider
UnsafeMath for this optimization.

Differential Revision: https://reviews.llvm.org/D130232

Co-authored-by: Andrea Faulds <andrea.faulds@arm.com>
2022-07-26 09:39:12 +01:00
Simon Tatham 55f1fbf005 [MC,llvm-objdump,ARM] Target-dependent disassembly resync policy.
Currently, when llvm-objdump is disassembling a code section and
encounters a point where no instruction can be decoded, it uses the
same policy on all targets: consume one byte of the section, emit it
as "<unknown>", and try disassembling from the next byte position.

On an architecture where instructions are always 4 bytes long and
4-byte aligned, this makes no sense at all. If a 4-byte word cannot be
decoded as an instruction, then the next place that a valid
instruction could //possibly// be found is 4 bytes further on.
Disassembling from a misaligned address can't possibly produce
anything that the code generator intended, or that the CPU would even
attempt to execute.

This patch introduces a new MCDisassembler virtual method called
`suggestBytesToSkip`, which allows each target to choose its own
resynchronization policy. For Arm (as opposed to Thumb) and AArch64,
I've filled in the new method to return a fixed width of 4.

Thumb is a more interesting case, because the criterion for
identifying 2-byte and 4-byte instruction encodings is very simple,
and doesn't require the particular instruction to be recognized. So
`suggestBytesToSkip` is also passed an ArrayRef of the bytes in
question, so that it can take that into account. The new test case
shows Thumb disassembly skipping over two unrecognized instructions,
and identifying one as 2-byte and one as 4-byte.

For targets other than Arm and AArch64, this is NFC: the base class
implementation of `suggestBytesToSkip` still returns 1, so that the
existing behavior is unchanged. Other targets can fill in their own
implementations as they see fit; I haven't attempted to choose a new
behavior for each one myself.

I've updated all the call sites of `MCDisassembler::getInstruction` in
llvm-objdump, and also one in sancov, which was the only other place I
spotted the same idiom of `if (Size == 0) Size = 1` after a call to
`getInstruction`.

Reviewed By: DavidSpickett

Differential Revision: https://reviews.llvm.org/D130357
2022-07-26 09:35:30 +01:00
Phoebe Wang 19c5638e4f [ArgPromotion] Transfer metadata nontemporal to promoted loads
Fixes #56703

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D130536
2022-07-26 16:30:08 +08:00
Cullen Rhodes 6082051da1 [AArch64][SVE] Add patterns to select mla/mls
Adds patterns for:

  add(a, select(mask, mul(b, c), splat(0))) -> mla(a, mask, b, c)
  sub(a, select(mask, mul(b, c), splat(0))) -> mls(a, mask, b, c)

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D130492
2022-07-26 07:52:44 +00:00
Victor Campos b43bec19b9 [ARM] Add Tag_CPU_arch missing value descriptions in attribute parser
The ARM attribute parser for Tag_CPU_arch is missing value descriptions
for Armv8-A and Armv8-R.

This patch adds these descriptions.

Reviewed By: pratlucas

Differential Revision: https://reviews.llvm.org/D129631
2022-07-26 08:32:40 +01:00
Weining Lu 904a87ace3 [LoongArch] Use `end namespace xxx` style comment. NFC 2022-07-26 15:01:29 +08:00
Kazu Hirata 3f3930a451 Remove redundaunt virtual specifiers (NFC)
Identified with tidy-modernize-use-override.
2022-07-25 23:00:59 -07:00
Sinan Lin 7a2f5dca09 [CodeMetrics] use hasOneLiveUse instead of hasOneUse while analyzing inlinable callsites
It would be better for CodeMetrics to use hasOneLiveUse while analyzing
static and called once callsites, since inline cost now uses
hasOneLiveUse instead of hasOneUse to avoid overpessimization on dead
constant cases (since this patch https://reviews.llvm.org/D109294).

This change has no noticeable influence now, but it helps improve the
accuracy of cost models of passes that use CodeMetrics.

Reviewed By: fhahn, nikic

Differential Revision: https://reviews.llvm.org/D130461
2022-07-26 13:46:19 +08:00
zhongyunde d485c1b73e [LoopDataPrefetch] Fix crash when TTI doesn't set CacheLineSize
Fix https://github.com/llvm/llvm-project/issues/56681

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D130418
2022-07-26 13:08:42 +08:00
Xiang Li 57006b14fa [DirectX backend] [NFC]Add DXILOpBuilder to generate DXIL operation
A new helper class DXILOpBuilder is added to create DXIL op function calls.

TableGen backend for DXILOperation will create table for DXIL op function parameter types.
When create DXIL op function, these parameter types will used to create the function type.

Reviewed By: bogner

Differential Revision: https://reviews.llvm.org/D130291
2022-07-25 21:49:59 -07:00
Sunho Kim cd953e4ffc [JITLink][COFF] Don't dead strip seh frame of exported function.
Adds keep-alive edges to pdata section to prevent dead strip of block when its parent function is alive.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D129945
2022-07-26 13:04:12 +09:00
Luo, Yuanke 5fb4134210 [X86][DAGISel] Don't widen shuffle element with AVX512
Currently the X86 shuffle lowering would widen the element type for
shuffle if the mask element value is adjacent. For below example

  %t2 = add nsw <16 x i32> %t0, %t1
  %t3 = sub nsw <16 x i32> %t0, %t1
  %t4 = shufflevector <16 x i32> %t2, <16 x i32> %t3,
                      <16 x i32> <i32 16, i32 17, i32 2, i32 3, i32 4,
                       i32 5, i32 6, i32 7, i32 8, i32 9, i32 10,
                       i32 11, i32 12, i32 13, i32 14, i32 15>

  ret <16 x i32> %t4

Compiler would transform the shuffle to
  %t4 = shufflevector <8 x i64> %t2, <8 x i64> %t3,
                      <8 x i64> <i32 8, i32 1, i32 2, i32 3, i32 4,
                                 i32 5, i32 6, i32 7>
This may lose the oppotunity to let ISel select mask instruction when
avx512 is enabled.

This patch is to prevent the tranform when avx512 feature is enabled.
Thank Simon for the idea.

Differential Revision: https://reviews.llvm.org/D129537
2022-07-26 11:56:03 +08:00
Sunho Kim df344e1f44 [JITLink][COFF] Implement IMAGE_COMDAT_SELECT_LARGEST partially.
Implement IMAGE_COMDAT_SELECT_LARGEST partially. It's going to fail if larger symbol appears but this hasn't happened at least in vcruntime library.

We probably would not implement this properly as it requires complicated runtime patching which is not of nature of JIT. However, we'd like to validate if larger section appears and report to the user in the near future.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D129941
2022-07-26 12:51:33 +09:00
Sunho Kim 736b6311e1 [JITLink][COFF] Implement IMAGE_WEAK_EXTERN_SEARCH_NOLIBRARY/LIBRARY.
Implement IMAGE_WEAK_EXTERN_SEARCH_NOLIBRARY/LIBRARY characteristics flag.

Since COFFObjectFile class will set undefined flag for symbols with no alias flag, ORC ObjectFileInterface will not pull in this symbol. So, we only need to make sure the scope is local. NOLIBRARY and LIBRARY are handled in the same way for now. (which is what lld does right now)

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D129939
2022-07-26 12:46:34 +09:00
Sunho Kim 71eff61be6 [JITLink][COFF] Handle duplicate external symbols.
Handles duplicate external symbols. This happens in few static libraries generaed from msvc toolchain.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D129937
2022-07-26 12:44:04 +09:00
jacquesguan cb370cf413 [DAGCombiner] Teach scalarizeExtractedBinop to support scalable splat.
This patch supports the scalable splat part for scalarizeExtractedBinop.

Differential Revision: https://reviews.llvm.org/D129725
2022-07-26 09:31:45 +08:00
Amara Emerson 5ae0472694 [GlobalISel] Fix miscompile of G_UREM + G_UDIV due to not checking for equality
of the first operands of each.

Fixes issue #55287

Differential Revision: https://reviews.llvm.org/D130525
2022-07-25 16:03:05 -07:00
Craig Topper 45944e7cf4 [RISCV] Refactor translateSetCCForBranch to prepare for D130508. NFC.
D130508 handles more constants than just 1 or -1. We need to extract
the constant instead of relying isOneConstant or isAllOnesConstant.
2022-07-25 15:54:54 -07:00
Alexander Shaposhnikov 1e636f2676 [IRBuilder] Add assert for AtomicRMW ordering
Add assert for AtomicRMW: Ordering != AtomicOrdering::Unordered
(https://github.com/llvm/llvm-project/blob/main/llvm/lib/IR/Verifier.cpp#L3944)
and adjust expandAtomicStore accordingly.

Test plan:
1/ ninja check-llvm check-clang check-lld
2/ Bootstrapped LLVM/Clang pass tests

Differential revision: https://reviews.llvm.org/D130457
2022-07-25 22:51:25 +00:00
Augie Fackler 85063090e9 MemoryBuiltins: remove malloc-family funcs from list
We no longer need specialized knowledge of these allocator functions in
this file since we have the correct attributes available now.

As far as I can tell the changes in the attributor tests are due to
things getting more consistent on alloc-family once we remove the static
list entries.

The two test changes in NewGVN merit extra scrutiny: NewGVN appears to
be _extremely_ sensitive to the inaccessiblememonly for reasons that
are beyond me. As a result, I had-enumerated all the attributes on
allocation functions in those two tests instead of using -inferattrs.
I assumed that the two -disable-simplify-libcalls tests there no
longer are sensible since the function declaration now includes all the
relevant attributes.

Differential Revision: https://reviews.llvm.org/D130107
2022-07-25 17:29:01 -04:00
Andrew Brown 3696a789d2 [WebAssembly] Use `localexec` as default TLS model for non-Emscripten targets
Only Emscripten supports dynamic linking with threads. To use
thread-local storage for other targets, this change defaults to the
`localexec` model.

Differential Revision: https://reviews.llvm.org/D130053
2022-07-25 13:25:46 -07:00
Matt Arsenault 62531518f9 RegAllocGreedy: Add a command line flag for reverseLocalAssignment
Introduce a flag like for some of the other target heuristic controls
to help with experimentation.
2022-07-25 15:47:15 -04:00
Matt Arsenault cb0c71e8b1 AMDGPU: Adjust register allocation priority values down
Set the priorities consistently to number of registers in the tuple -
1. Previously we started at 1, and also tried to give SGPR higher
values than VGPRs. There's no point in assigning SGPRs higher values
now that those are allocated in a separate regalloc run.

This avoids overflowing the 5 bits used for the class priority in the
allocation heuristic for 32 element tuples. This avoids some cases
where smaller registers unexpectedly get prioritized over larger.
2022-07-25 15:47:15 -04:00
Joseph Huber d61d72dae6 [OpenMP] Remove noinline attributes in the device runtime
We previously used the `noinline` attributes to specify some defintions
which should be kept alive in the runtime. These were then stripped
immediately in the OpenMPOpt module pass. However, Since the changes in
D130298, we not explicitly state which functions will have external
visiblity in the bitcode library. Additionally the OpenMPOpt module pass
should run before the inliner pass, so this shouldn't make a difference
in whether or not the functions will be alive for the initial pass of
OpenMPOpt. This should simplify the interface, and additionally save
time spend on scanning funciton names for noinline.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D130368
2022-07-25 15:44:50 -04:00
Sanjay Patel bfb9b8e075 [Passes] add a tail-call-elim pass near the end of the opt pipeline
We call tail-call-elim near the beginning of the pipeline,
but that is too early to annotate calls that get added later.

In the motivating case from issue #47852, the missing 'tail'
on memset leads to sub-optimal codegen.

I experimented with removing the early instance of
tail-call-elim instead of just adding another pass, but that
appears to be slightly worse for compile-time:
+0.15% vs. +0.08% time.
"tailcall" shows adding the pass; "tailcall2" shows moving
the pass to later, then adding the original early pass back
(so 1596886802 is functionally equivalent to 180b0439dc ):
https://llvm-compile-time-tracker.com/index.php?config=NewPM-O3&stat=instructions&remote=rotateright

Note that there was an effort to split the tail call functionality
into 2 passes - that could help reduce compile-time if we find
that this change costs more in compile-time than expected based
on the preliminary testing:
D60031

Differential Revision: https://reviews.llvm.org/D130374
2022-07-25 15:25:47 -04:00
Warren Ristow 3bbd380a5b [Reassociate][NFC] Use an appropriate dyn_cast for BinaryOperator
In D129523, it was noted that there is are some questionable naked casts
from Instruction to BinaryOperator, which could be addressed by doing a
dyn_cast directly to BinaryOperator, avoiding the need for the later cast.
This cleans up that casting.

Reviewed By: nikic, spatel, RKSimon

Differential Revision: https://reviews.llvm.org/D130448
2022-07-25 10:24:43 -07:00
Vladislav Dzhidzhoev fc93ba061a [GlobalISel][DebugInfo] Remove debug info with zero line from constants inserted at entry block
Emission of constants having DebugLoc with line 0 causes significant increase of debug_line section size for some source files.

To illustrate, we can compare section sizes of several files from llvm test-suite, built with SelectionDAG vs GlobalISel, on Aarch64 (macOS), using -O0 optimization level:

| Source path                                                    | SDAG text sz | GISel text sz | SDAG debug_line sz |  GISel debug_line sz
| -------------------------------------------------------------- | ------------ | ------------- | ------------------ | --------------------
| `SingleSource/Regression/C/gcc-c-torture/execute/strlen-2.c`   | 15320        | 660           | 14872              | 6340
| `SingleSource/Regression/C/gcc-c-torture/execute/20040629-1.c` | 33640        | 26300         | 2812               | 6693
| `SingleSource/Benchmarks/Misc/flops-4.c`                       | 1428         | 1196          | 594                | 1008
| `MultiSource/Benchmarks/MiBench/consumer-typeset/z31.c`        | 2716         | 964           | 809                | 903
| `MultiSource/Benchmarks/Prolangs-C/gnugo/showinst.c`           | 2534         | 2502          | 189                | 573

For instance, here is a fragment of `flops-4.c.o` debug line section dump

```
Address            Line   Column File   ISA Discriminator Flags
------------------ ------ ------ ------ --- ------------- -------------
0x0000000000000000    174      0      1   0             0  is_stmt
0x0000000000000010      0      0      1   0             0
0x0000000000000018    185      4      1   0             0  is_stmt prologue_end
0x000000000000001c      0      0      1   0             0
0x0000000000000024    186      4      1   0             0  is_stmt
0x000000000000002c    189     10      1   0             0  is_stmt
0x0000000000000030      0      0      1   0             0
0x0000000000000038    207     11      1   0             0  is_stmt
0x0000000000000044    208     11      1   0             0  is_stmt
0x0000000000000048      0      0      1   0             0
0x0000000000000058    210     10      1   0             0  is_stmt
0x000000000000005c      0      0      1   0             0
0x0000000000000060    211     10      1   0             0  is_stmt
0x0000000000000064      0      0      1   0             0
0x000000000000006c    212     10      1   0             0  is_stmt
0x0000000000000070      0      0      1   0             0
0x000000000000007c    213     10      1   0             0  is_stmt
0x0000000000000080      0      0      1   0             0
0x0000000000000088    214     10      1   0             0  is_stmt
0x000000000000008c      0      0      1   0             0
0x0000000000000094    215     10      1   0             0  is_stmt
```

Lot of zero lines are produced by constants (global values) having DebugLoc with line 0.
It seems that they're not significant for debugging experience.

With the commit applied, total size of debug_line sections of llvm shared libraries has reduced by 2.5%.
Change of debug line section size of files listed above:

| Source path                                                    | GISel debug_line sz | Patch debug_line sz
| -------------------------------------------------------------- | ------------------- | --------------------
| `SingleSource/Regression/C/gcc-c-torture/execute/strlen-2.c`   | 6340                | 1465
| `SingleSource/Regression/C/gcc-c-torture/execute/20040629-1.c` | 6693                | 3782
| `SingleSource/Benchmarks/Misc/flops-4.c`                       | 1008                | 609
| `MultiSource/Benchmarks/MiBench/consumer-typeset/z31.c`        | 903                 | 841
| `MultiSource/Benchmarks/Prolangs-C/gnugo/showinst.c`           | 573                 | 190

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D127488
2022-07-25 17:19:01 +00:00
Craig Topper 1db6d6dcd8 [RISCV] Teach RISCVCodeGenPrepare to optimize (zext (abs(i32 X, i1 1))).
(abs(i32 X, i1 1) always produces a positive result. The 'i1 1'
means INT_MIN input produces poison. If the result is sign extended,
InstCombine will convert it to zext. This does not produce ideal
code for RISCV.

This patch reverses the zext back to sext which can be folded
into a subw or negw. Ideally we'd do this in SelectionDAG, but
we lose the INT_MIN poison flag when llvm.abs becomes ISD::ABS.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D130412
2022-07-25 09:36:41 -07:00
Craig Topper 00060a7b97 [X86] Custom type legalize v2i32 smulo/umulo to use a single pmuldq/pmuludq.
With SSE4.1 and above we were using 3 multiply instructions. This
was due to type legalization widening to v4i32 and the low half
being done with pmulld while the high half used two pmuldq/pmuludq.

Instead of that, we can use a single pmuludq/pmuldq to calculate
the full product at once, extract the high and low bits and compare
to check for overflow.

I've restricted SMULO to sse4.1 to get pmuldq. We can probably
do a fixup to pmuludq on earlier targets, but that's for another day.

I was going through my git stash and found an early version of this patch
from a year or two ago so I went ahead and finished it.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D130432
2022-07-25 09:12:35 -07:00
Bradley Smith 953a98ef8d [AArch64][SVE] Fold target specific ext/trunc nodes into loads/stores
Due to the way fixed length SVE lowering works, we sometimes introduce
ext/trunc nodes very late, these nodes then immediately get converted
into target specific nodes (UUNPKLO/UZP1) before they get a chance to be
folded into a load/store.

This patch introduces target specific dag combines for these nodes so that
we can still create extending loads/truncating stores out of them.

Differential Revision: https://reviews.llvm.org/D128065
2022-07-25 15:24:05 +00:00
Sunho Kim 0f00e58841 [JITLink][COFF][x86_64] Reimplement ADDR32NB/REL32.
Reimplements ADDR32NB/REL32 relocations properly, out-of-reach targets will be dealt in the separate patch that will generate the stub for dllimport symbols.

Reviewed By: sgraenitz

Differential Revision: https://reviews.llvm.org/D129936
2022-07-25 23:41:53 +09:00
Sunho Kim c7ea209068 [ORC][COFF] Properly set weak flag to COMDAT symbols.
Properly set weak flag to COMDAT symbols so that no duplicate definition error will be generated. There is an inaccuracy in setting plain weak for largest selection type, which will be dealt with soon when largest type is properly implemented.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D129764
2022-07-25 23:24:25 +09:00
Sunho Kim aef75aec38 [JITLink][COFF] Implement IMAGE_SYM_CLASS_LABEL.
AcceptedPublic
Implements IMAGE_SYM_CLASS_LABEL. It's simply a section + offset. This is not used a lot by llvm mc but very commonly used by MSVC compiler.

Reviewed By: sgraenitz

Differential Revision: https://reviews.llvm.org/D129754
2022-07-25 23:22:59 +09:00
Sunho Kim 07aa8fc8db [JITLink][COFF] Handle out-of-order COMDAT second symbol.
Handle out-of-order COMDAT second symbols. In llvm codegen, the second symbol of COMDAT sequence always follows the first symbol in the global symbol list. But, when the object file came from MSVC compiler, these can come in out of order.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D129721
2022-07-25 23:02:31 +09:00
Sunho Kim b4878493dc [JITLink][COFF] Don't dead strip COMDAT associative symbol.
This prevents the dead strip of associative comdat section when parent section is alive.

Reviewed By: sgraenitz

Differential Revision: https://reviews.llvm.org/D129720
2022-07-25 22:59:19 +09:00
Weining Lu aff68f5ad6 [LoongArch] Parse LoongArch base ABI in ObjectYAML and llvm-readobj
LoongArch e_flags definition:
https://loongson.github.io/LoongArch-Documentation/LoongArch-ELF-ABI-EN.html#_e_flags_identifies_abi_type_and_version

Differential Revision: https://reviews.llvm.org/D130238
2022-07-25 20:40:57 +08:00
Cullen Rhodes c04ff587dc [AArch64] Combine setcc (iN (bitcast (vNi1 X))) with vecreduce_or
Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D130163
2022-07-25 12:14:33 +00:00
Benjamin Kramer 5fde785186 [ValueTracking] Fix unused variable warning in release builds. NFC 2022-07-25 13:28:32 +02:00
David Stuttard b14d7bf750 AMDGPU: Turn off force init 16 input SGPRS for pal
Pal uses a different mechanism for user sgprs.

Differential Revision: https://reviews.llvm.org/D129566
2022-07-25 10:52:46 +01:00
jacquesguan d8800ead62 [RISCV] Scalarize binop followed by extractelement.
This patch adds shouldScalarizeBinop to RISCV target in order to convert an extract element of a vector binary operation into an extract element followed by a scalar binary operation.

Differential Revision: https://reviews.llvm.org/D129545
2022-07-25 17:23:31 +08:00
David Spickett 3a35bcef22 [llvm][FileCheck] Fix unit tests failures with EXPENSIVE_CHECKS
EXPENSIVE_CHECKS enables _GLIBCXX_DEBUG, which makes std::sort
check that the compare function is implemented correctly.

To do this it calls it with the first item as both sides.
Which trips the assert here because we think they're
2 capture ranges that overlap, when it's just the same range twice.

Check up front for the two sides being the same item
(same address, not just ==).

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D130282
2022-07-25 08:19:28 +00:00
Nikita Popov fb7caa3c7b [AsmPrinter] Reject ptrtoint to larger size in lowerConstant()
When using a ptrtoint to a size larger than the pointer width in a
global initializer, we currently create a ptr & low_bit_mask style
MCExpr, which will later result in a relocation error during object
file emission.

This patch rejects the constant expression already during
lowerConstant(), which results in a much clearer error message
that references the constant expression at fault.

This fixes https://github.com/llvm/llvm-project/issues/56400,
for certain definitions of "fix".

Differential Revision: https://reviews.llvm.org/D130366
2022-07-25 10:18:27 +02:00
Rosie Sumpter 034a27e688 [AArch64] Add f16 fpimm patterns
This patch recognizes f16 immediates as legal and adds the necessary
patterns. This allows the fadda folding introduced in 05d424d165
to be applied to the f16 cases.

Differential Revision: https://reviews.llvm.org/D129989
2022-07-25 09:08:10 +01:00
Peter Waller f8919d2f7e [NFC][GVN] Put phi-translation of 'add' behind a switch
The code in this `#if 0` block appears to be a net benefit. Put it
behind a switch defaulting to off to support experimentation and as a
request for comment.

The codegen impact of enabling this that I'm currently persuing is that
it allows PRE to take place more frequently, particularly in loops with
second order recurrences.

Preliminary experimental data:

Across LNT on AArch64, 54 benchmarks are sped up by >1%, and 42 are
regressed by >1%, the geomean (exec_time_enabled / exec_time_disabled)
of these 96 "1% or greater significance" benchmarks is 0.991. For the
full set of 770 benchmarks it's 0.998.

There are two benchmarks which experience a >30% speedup, and the worst
slowdown is ~12%, and for every benchmark with a slowdown there is a
benckmark which is sped up by a greater factor.

Differential Revision: https://reviews.llvm.org/D130241
2022-07-25 07:59:47 +00:00
Cullen Rhodes 836f790bb1 [AArch64][SVE] Add patterns to select masked add/sub instructions
When lowering add(a, select(mask, b, splat(0))) the sel instruction can
be removed by using predicated add/sub instructions.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D129751
2022-07-25 07:22:05 +00:00
Fangrui Song 91e2cd4fa9 [llvm-objcopy] Remove getDecompressedSizeAndAlignment. NFC 2022-07-25 00:06:36 -07:00
Max Kazantsev a053f35990 [SCEV][NFC][CT] Cheaper handling of guards in isBasicBlockEntryGuardedByCond
Handle guards uniformly with assumes, rather than iterating through all
block instructions in attempt to find them.

Differential Revision: https://reviews.llvm.org/D129874
Reviewed By: nikic
2022-07-25 13:38:59 +07:00
Kazu Hirata 9d5a544d34 [Hexagon] Remove isLateInstrFeedsEarlyInstr (NFC)
The last use was removed on May 3, 2017 in commit
2af5037d34.

This patch also removes isLateResultInstr and isEarlySourceInstr as
they become dead once we remove isLateInstrFeedsEarlyInstr.
2022-07-24 22:55:14 -07:00
Kazu Hirata 95a932fb15 Remove redundaunt override specifiers (NFC)
Identified with modernize-use-override.
2022-07-24 22:28:11 -07:00
Fangrui Song 7181c4e10a [llvm-objcopy] --compress-debug-sections: fix uninitialized ch_reserved for Elf64_Chdr
ch_reserved is uninitialized and the output is not deterministic. Fix it.
Rewrite and improve compress-debug-sections-zlib.test.
2022-07-24 22:19:00 -07:00
Kazu Hirata b5188591a0 [llvm] Remove redundaunt virtual specifiers (NFC)
Identified with modernize-use-override.
2022-07-24 21:50:35 -07:00
Fangrui Song 73c84f9c13 [llvm-objcopy] Remove remnant .zdebug code 2022-07-24 18:52:15 -07:00
Warren Ristow 3089b411a4 [Reassociate][NFC] Consistent checking for FastMathFlags suitability
In D129523, it was noted that the approach to check whether a value can
have FastMathFlags was done in different ways, and they should be made
consistent.  This patch makes minor changes to fix that.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D130408
2022-07-24 17:44:30 -07:00
Kazu Hirata acf648b5e9 Use llvm::less_first and llvm::less_second (NFC) 2022-07-24 16:21:29 -07:00
Kazu Hirata bafeb63448 [Hexagon] Remove unused declaration CanReturnSmallStruct (NFC)
The declaration was introduced without a corresponding definition on
Dec 12, 2011 in commit 1213a7a57f.
2022-07-24 14:48:09 -07:00
Kazu Hirata 49f72cb5bd [Hexagon] Remove unused declaration SelectZeroExtend (NFC)
The corresponding definition was removed on Jan 23, 2018 in commit
3780a0e1fa.
2022-07-24 14:48:08 -07:00
Kazu Hirata 8ac2d06195 [IPO] Use range-based for loops (NFC) 2022-07-24 14:48:06 -07:00
Sanjay Patel a925bef70c [ValueTracking] allow vector types in isImpliedCondition()
The matching of constants assumed integers, but we can handle
splat vector constants seamlessly with m_APInt.
2022-07-24 17:46:48 -04:00
Kazu Hirata ea29810c9d [CodeGen] Remove a redundant void (NFC)
Identified with modernize-redundant-void-arg.
2022-07-24 12:27:14 -07:00
Matt Arsenault 40abb28f61 RegAllocGreedy: Fix subranges when rematerializing dead subreg defs
This would create a new interval missing the subrange and hit this
verifier error:

*** Bad machine code: Live interval for subreg operand has no subranges ***
- function:    test_remat_subreg_def
- basic block: %bb.0  (0xa568758) [0B;128B)
- instruction: 32B	dead undef %4.sub0:vreg_64 = V_MOV_B32_e32 2, implicit $exec
2022-07-24 11:51:59 -04:00
Simon Pilgrim 562ee7cc5f [DAG] visitSMUL_LOHI/visitUMUL_LOHI - ensure we canonicalize constants to the RHS 2022-07-24 16:09:56 +01:00
Simon Pilgrim 428c0f2adc [DAG] getNode - assert that SMUL_LOHI/UMUL_LOHI nodes have the correct ops + types 2022-07-24 15:30:57 +01:00
Simon Pilgrim 0708771cce [DAG] MaskedVectorIsZero - don't bother with (-1).isSubsetOf mask check. NFC.
Just use KnownBits::isZero() to ensure all the bits are known zero.
2022-07-24 13:12:21 +01:00
Simon Pilgrim e82d49bfed [DAG] SimplifyMultipleUseDemandedBits - early-out for any scalable vector types
Noticed while working to remove SelectionDAG::GetDemandedBits - we were relying on the callers to have already bailed for scalable vectors
2022-07-24 12:59:43 +01:00
Simon Pilgrim a3e38b4a20 [DAG] SimplifyDemandedVectorElts - if every and/mul element-pair has a zero/undef then just constant fold to zero 2022-07-24 12:00:31 +01:00
Simon Pilgrim 69d1e805ce [X86] combineAndnp - remove unused variable. NFC. 2022-07-24 11:32:44 +01:00
Simon Pilgrim ce81a0df67 [X86][SSE] Enable X86ISD::ANDNP constant folding 2022-07-24 11:07:34 +01:00
Simon Pilgrim 293899c64b [X86] Don't assume an AND/ANDNP element is undef/undemanded just because one element is undef
For mask ops like these, the other operand's corresponding element might be zero (result = zero) - so we must demand all the bits and that element.

This appears to be what D128570 was trying to fix - both sides of the funnel shift mask of the vXi64 (legalized to v2Xi32) were incorrectly simplifying the upper 32-bit halves to undef, resulting in bad folds later on.

I intend to address the test case regressions, but this close to the release branch I'd prefer to get a fix in first.
2022-07-24 10:53:38 +01:00
Fangrui Song 7feab85df8 [MC] Remove unused renameELFSection 2022-07-24 01:23:07 -07:00
Fangrui Song 6977ff4006 [MC] Delete dead zlib-gnu code and simplify writeSectionData 2022-07-24 01:17:34 -07:00
Kazu Hirata 5bbe452e75 Revert "[Orc] Use default member initialization (NFC)"
This reverts commit d534967b66.

The patch causes build failures, such as:

https://lab.llvm.org/buildbot/#/builders/121/builds/21760
2022-07-23 21:10:10 -07:00
Kazu Hirata 068d5066b3 [Hexagon] Remove unused declaration getByteVectorTy (NFC)
The declaration was introduced without a corresponding definition on
Sep 7, 2020 in commit f5d07a05bb.
2022-07-23 19:40:44 -07:00
Fangrui Song 89357f0cb9 [Passes] Simplify ChangePrinter names. NFC 2022-07-23 19:32:13 -07:00
Kazu Hirata d534967b66 [Orc] Use default member initialization (NFC)
Identified with modernize-use-default-member-init
2022-07-23 18:36:23 -07:00
Craig Topper 9adc00a9d0 [RISCV] Add a continue to reduce nesting. NFC 2022-07-23 17:36:12 -07:00
Kazu Hirata 7bfa06f6c0 [CodeGen] Use range-based for loops (NFC) 2022-07-23 16:10:46 -07:00
Kazu Hirata 3736a498d4 [IPO] Use std::array for AccessKind2Accesses (NFC)
Switching to std:array allow us to use fill.

While I am at it, this patch also converts one for loop to a
range-based one.
2022-07-23 15:47:53 -07:00
Fangrui Song 7225213c0a [LegacyPM] Remove {,PostInline}EntryExitInstrumenterPass
Following recent changes removing non-core features of the legacy
PM/optimization pipeline.
2022-07-23 15:30:15 -07:00
Nuno Lopes 9df0b254d2 [NFC] Switch a few uses of undef to poison as placeholders for unreachable code 2022-07-23 21:50:11 +01:00
Kazu Hirata 2d2e2e7ea9 [Vectorize] Remove isConsecutiveLoadOrStore (NFC)
The last use was removed on Jan 4, 2022 in commit
95a93722db.
2022-07-23 13:01:14 -07:00
Kazu Hirata ae998555ba [AMDGPU] Remove a redundant variable (NFC)
ArrayRef has operator[], so we don't need to access the contents via
data().
2022-07-23 12:29:05 -07:00
Kazu Hirata 97718180d7 [Analysis] Remove a redundant return statement (NFC)
Identified with readability-redundant-control-flow.
2022-07-23 11:35:19 -07:00
Fangrui Song c17450a094 [AMDGPU] Change DEBUG_TYPE from isel to amdgpu-isel
to match all other *ISelDAGToDAG.cpp
2022-07-23 11:32:02 -07:00
Kazu Hirata 85dadf6d8d [TableGen] Drop an unnecessary const from a return type (NFC)
This patch also drops "&" that binds to a temporary.

Identified with readability-const-return-type.
2022-07-23 11:30:23 -07:00
Simon Pilgrim ac8be21365 [DAG] isSplatValue - don't attempt to merge any BITCAST sub elements if they contain UNDEFs
We still haven't found a solution that correctly handles 'don't care' sub elements properly - given how close it is to the next release branch, I'm making this fail safe change and we can revisit this later if we can't find alternatives.

NOTE: This isn't a reversion of D128570 - it's the removal of undef handling across bitcasts entirely

Fixes #56520
2022-07-23 18:38:48 +01:00
Kazu Hirata 1cc7f5bede Use static_assert instead of assert (NFC)
Identified with misc-static-assert.
2022-07-23 09:22:27 -07:00
Simon Pilgrim 676a03d8a5 [X86] matchBinaryShuffle - limit SHUFFLE(X,Y) -> OR(X,Y) cases to where X + Y are the same width as the result
Minor bit of prep work toward not unnecessarily widening shuffle operands in combineX86ShufflesRecursively, instead only widening in combineX86ShuffleChain if we actual find a match - see Issue #45319
2022-07-23 16:56:45 +01:00
Dmitry Vassiliev 4acc02357e [IR] Fixed ambiguous call to llvm::report_fatal_error
This patch fixes the following error with MSVC 16.9.2:
llvm/lib/IR/GCStrategy.cpp(35): error C2668: 'llvm::report_fatal_error': ambiguous call to overloaded function
llvm/include/llvm/Support/ErrorHandling.h(75): note: could be 'void llvm::report_fatal_error(const llvm::Twine &,bool)'
llvm/include/llvm/Support/ErrorHandling.h(73): note: or 'void llvm::report_fatal_error(llvm::StringRef,bool)'
llvm/lib/IR/GCStrategy.cpp(35): note: while trying to match the argument list '(const std::string)'

Reviewed By: RKSimon, barannikov88

Differential Revision: https://reviews.llvm.org/D130407
2022-07-23 16:28:18 +02:00
Dmitri Gribenko aba43035bd Use llvm::sort instead of std::sort where possible
llvm::sort is beneficial even when we use the iterator-based overload,
since it can optionally shuffle the elements (to detect
non-determinism). However llvm::sort is not usable everywhere, for
example, in compiler-rt.

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D130406
2022-07-23 15:19:05 +02:00
Simon Pilgrim 5f89d2bae9 [DAG] Move OR(AND(X,C1),AND(OR(X,Y),C2)) -> OR(AND(X,OR(C1,C2)),AND(Y,C2)) fold to SimplifyDemandedBits
This will fix the SystemZ v3i31 memcpy regression in D77804 (with the help of D129765 as well....).

It should also allow us to /bend/ the oneuse limitation for cases where we can use demanded bits to safely peek though multiple uses of the AND ops.
2022-07-23 13:17:24 +01:00
Simon Pilgrim 6aff1b7b3c [DAG] SimplifyDemandedBits - pull out repeated getValueType() calls. NFC. 2022-07-23 12:01:54 +01:00
Simon Pilgrim 2421a5af72 [DAG] ExpandIntRes_ADDSUB - create UADDO/USUBO instead of ADDCARRY/SUBCARRY if overflow is known to be zero
As noticed on D127115, when splitting ADD/SUB nodes we often end up with cases where overflow from the lower bits is impossible - in such cases we're better off breaking the carry chain dependency as soon as possible.

This path is being exercised by llvm/test/CodeGen/ARM/dsp-mlal.ll, although I haven't been able to get any codegen diff without a topological worklist.
2022-07-23 11:13:44 +01:00
Simon Pilgrim 8937252465 [DAG] computeKnownBits - add basic shift-by-parts handling
Concat KnownBits from ISD::SHL_PARTS / ISD::SRA_PARTS / ISD::SRL_PARTS lo/hi operands and perform the KnownBits calculation by the shift amount on the extended type, before splitting the KnownBits based on the requested lo/hi result.
2022-07-23 09:46:30 +01:00
Johannes Doerfert 6b7eae11f1 [Attributor][FIX] HasBeenWrittenTo logic should only be used for reads
If we look at a write, we should not enact the "has been written to"
logic introduced to avoid spurious write -> read dependences. Doing so
lead to elimination of stores we needed, which is obviously bad.
2022-07-22 23:57:57 -05:00
ARCHIT SAXENA 3bb1ce2319 Add a nop instruction if a section starts with landing pad for function splitter
This change adds a nop instruction if section starts with landing pad. This change is like [D73739](https://reviews.llvm.org/D73739) which avoids zero offset landing pad in basic block sections.

Detailed description:
The current machine functions splitter can create ˜sections which start with a landing pad themselves. This places landing pad at offset zero from LPStart.
```
	.section	.text.split.foo10,"ax",@progbits
foo10.cold:                             # %lpad
	.cfi_startproc
	.cfi_personality 3, __gxx_personality_v0
	.cfi_lsda 3, .Lexception5
	.cfi_def_cfa %rsp, 16
.Ltmp11: <--- This is a Landing pad and also LP Start as it is start of this section
	movq	%rax, %rdi <--- first instruction is at offest 0 from LPStart
	callq	_Unwind_Resume@PLT

 ```
This will cause landing pad entries to become zero (.Ltmp11-foo10.cold)
```
.Lcst_begin4:
	.uleb128 .Ltmp9-.Lfunc_begin2           # >> Call Site 1 <<
	.uleb128 .Ltmp10-.Ltmp9                 #   Call between .Ltmp9 and .Ltmp10
	.uleb128 .Ltmp11-foo10.cold  <---This is zero           #     jumps to .Ltmp11
	.byte	3                               #   On action: 2
	.uleb128 .Ltmp10-.Lfunc_begin2          # >> Call Site 2 <<
	.uleb128 .Lfunc_end9-.Ltmp10            #   Call between .Ltmp10 and .Lfunc_end9
	.byte	0                               #     has no landing pad
	.byte	0                               #   On action: cleanup
	.p2align	2
```
The C++ ABI somehow assumes that no landing pads point directly to LPStart (which works in the normal case since the function begin is never a landing pad), and uses LP.offset = 0 to specify no landing pad. This change adds a nop instruction at start of such sections so that such a case could be avoided. Output:
```
	.section	.text.split.foo10,"ax",@progbits
foo10.cold:                             # %lpad
	.cfi_startproc
	.cfi_personality 3, __gxx_personality_v0
	.cfi_lsda 3, .Lexception5
	.cfi_def_cfa %rsp, 16
	nop <--- new instruction that is added
.Ltmp11:
	movq	%rax, %rdi
	callq	_Unwind_Resume@PLT
```

Reviewed By: modimo, snehasish, rahmanl

Differential Revision: https://reviews.llvm.org/D130133
2022-07-22 15:20:10 -07:00
Alexander Shaposhnikov 2ebfda2417 [InstCombine] Improve folding of mul + icmp
This diff adds folds for patterns like X * A < B
where A, B are constants and "mul" has either "nsw" or "nuw".
(to address https://github.com/llvm/llvm-project/issues/56563).

Test plan:
1/ ninja check-llvm check-clang
2/ Bootstrapped LLVM/Clang pass tests

Differential revision: https://reviews.llvm.org/D130039
2022-07-22 22:08:53 +00:00
Kjetil Kjeka ff1920d106 [NVPTX] Promote i24, i40, i48 and i56 to next power-of-two register when passing
Today llc will crash when attempting to use non-power-of-two integer types as
function arguments or returns. This patch enables passing non standard integer
values in functions by promoting them before store and truncating after load.

The main motivation of implementing this change is that rust casts small structs
(less than pointer size) into an integer of the same size. As an example, if a
struct contains three u8 then it will be passed as an i24. This patch is a step
towards enabling rust compilation to ptx while retaining the target independent
optimizations.

More context can be found in https://github.com/llvm/llvm-project/issues/55764

Differential Revision: https://reviews.llvm.org/D129291
2022-07-22 14:14:12 -07:00
Alexander Yermolovich 6690c64639 Revert "[DWP][DWARF] Detect and error on debug info offset overflow"
This reverts commit 417738d3a6.
2022-07-22 13:02:08 -07:00
Sanjay Patel 08091a99ae Revert "[InstCombine] enhance fold for subtract-from-constant -> xor"
This reverts commit 79bb915fb6.
This caused regressions because SCEV works better with sub.
2022-07-22 15:56:24 -04:00
Philip Reames b5c7213647 [LV] Use early return to simplify code structure 2022-07-22 12:15:14 -07:00
Arnold Schwaighofer 58e6ee0e1f llvm.swift.async.context.addr cannot be modeled as NoMem because we don't want it to be cse'd accross async suspends
An async suspend models the split between two partial async functions.
`llvm.swift.async.context.addr ` will have a different value in the two
partial functions so it is not correct to generally CSE the instruction.

rdar://97336162

Differential Revision: https://reviews.llvm.org/D130201
2022-07-22 11:50:58 -07:00
Alexander Yermolovich 417738d3a6 [DWP][DWARF] Detect and error on debug info offset overflow
Right now we silently overflow uint32_t for debug_indfo sections. Added a check
and error out.

Differential Revision: https://reviews.llvm.org/D130046
2022-07-22 10:45:28 -07:00
Simon Pilgrim 939cf9b1be [AArch64] Use neon instructions for i64/i128 ISD::PARITY calculation
As noticed on D129765 and reported on Issue #56531 - aarch64 targets can use the neon ctpop + add-reduce instructions to speed up scalar ctpop instructions, but we fail to do this for parity calculations.

I'm not sure where the cutoff should be for specific CPUs, but i64 (+ i128 special case) shows a definite reduction in instruction count. i32 is about the same (but scalar <-> neon transfers are probably more costly?), and sub-i32 promotion looks to be a definite regression compared to parity expansion optimized for those widths.

Differential Revision: https://reviews.llvm.org/D130246
2022-07-22 17:24:17 +01:00
Mircea Trofin 7b81a81d5f [NFC] FunctionSamples::getEntrySamples -> getHeadSamplesEstimate
The name `getEntrySamples` was misleading for 2 reasons. One, it's
close in name to `Function::getEntryCount`, but the equivalent here is
`getHeadSamples`; second, as opposed to the other get* APIs in
`FunctionSamples`, it performs an estimate/heuristic rather than just
retrieving raw data (or a non-heuristic derivate off that data, like
`getMaxCountInside`)

The new name should more clearly communicate its intent; and, being
close (in name) to `getHeadSamples`, it should allow the reader discover
the relation between them.

Also updated the doc comments for both `getHeadSamples[Estimate]` so a
reader may better understand the relation between them.

Differential Revision: https://reviews.llvm.org/D130281
2022-07-22 09:17:59 -07:00
Benjamin Kramer 5a445395e4 [LV] Remove unused variable. NFC. 2022-07-22 17:43:58 +02:00
Craig Topper be208b40c1 [DAGCombiner] Simplify code around call to reduceLoadWidth in visitAND. NFC
We were looking for loads or any_extend+load. reduceLoadWidth
hasn't known how to look through such an any_extend to find the
load since D40667 almost 5 years ago.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D130333
2022-07-22 08:36:56 -07:00
Philip Reames d7bf81fd51 [LV] Rework widening cost of uniform memory ops for clarity [nfc]
Reorganize the code to make it clear what is and isn't handle, and why.
Restructure bailout to remove (false and confusing) dependence on
CM_Scalarize; just return invalid cost and propagate, that's what it
is for.
2022-07-22 08:35:45 -07:00
Malhar Jajoo 41958f76d8 [Costmodel] Add "type-based-intrinsic-cost" cli option
This patch adds a command line flag to be able to test
the type based cost-model analysis for Intrinsics.

Differential Revision: https://reviews.llvm.org/D129109
2022-07-22 15:50:57 +01:00
Shubham Narlawar f55dbfbd9d [AArch64] Move SeparateConstOffsetFromGEPPass before LSR and enable EnableGEPOpt by default.
GEP's across basic blocks were not getting splitted due to EnableGEPOpt
which was turned off by default. Hence, EarlyCSE missed the opportunity
to eliminate common part of GEP's. This can be achieved by simply
turning GEP pass on.
 - This patch moves SeparateConstOffsetFromGEPPass() just before LSR.
 - It enables EnableGEPOpt by default.

Resolves - https://github.com/llvm/llvm-project/issues/50528

Added an unit test.

Differential Revision: https://reviews.llvm.org/D128582
2022-07-22 15:20:53 +01:00
Nikita Popov c2be703c6c [AsmPrinter] Move lowerConstant() error code out of switch (NFC)
Move this out of the switch, so that different branches can
indicate an error by breaking out of the switch. This becomes
important if there are more than the two current error cases.
2022-07-22 16:08:28 +02:00
Joseph Huber 3d0ab8638b [Internalize] Support glob patterns for API lists
The internalize pass supports an option to provide a list of symbols
that should not be internalized. THis is useful retaining certain
defintions that should be kept alive. However, this interface is
somewhat difficult to use as it requires knowing every single symbol's
name and specifying it. Many APIs provide common prefixes for the
symbols exported by the library, so it would make sense to be able to
match these using a simple glob pattern. This patch changes the handling
from a simple string comparison to a glob pattern match.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D130319
2022-07-22 08:24:32 -04:00
Petar Avramovic 8de1f04c77 [AMDGPU] gfx11 Fix VOP3 dot instructions
Fix src modifiers for operands with bf16 type.
op_sel[0:1] are ignored.

Differential Revision: https://reviews.llvm.org/D129084
2022-07-22 11:43:35 +02:00
Cullen Rhodes bf268a05cd [AArch64] Emit vector FP cmp when LE is used with fast-math
Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D130093
2022-07-22 07:53:55 +00:00
Johannes Doerfert a50b9f9f1f [Attributor][FIX] Handle non-recursive but re-entrant functions properly
If a function is non-recursive we only performed intra-procedural
reasoning for reachability (via AA::isPotentiallyReachable). However,
if it is re-entrant that doesn't mean we can't reach. Instead of this
problematic logic in the reachability reasoning we utilize logic in
AAPointerInfo. If a location is for sure written by a function it can
be re-entrant or recursive we know only intra-procedural reasoning is
sufficient.
2022-07-22 00:00:56 -05:00
Max Kazantsev a40af8589e [RS4GC] Handle special cases in unreachable code for memcpy/memmov
The existing code doesn't expect dummy values (undef, poison, null-derived
constants etc) as arguments of these intrinsics. However, they can be there
in unreached code. Currently we fail trying to find base for them.

Handle these cases separately. Return null as base for them to be consistent
with the handling in the main algorithm in findBaseDefiningValue.

Differential Revision: https://reviews.llvm.org/D129561
Reviewed By: apilipenko
2022-07-22 11:30:43 +07:00
Johannes Doerfert 62f7888d6d [Attributor] Dominating must-write accesses allow unknown initial values
If we have a dominating must-write access we do not need to know the
initial value of some object to perform reasoning about the potential
values. The dominating must-write has overwritten the initial value.
2022-07-21 23:08:43 -05:00
Johannes Doerfert c72d93a08a [Attributor][NFC] Remove unnecessary overwritten methods 2022-07-21 21:57:02 -05:00
Fangrui Song 9742166935 [LoongArch] Support load/store of dso_local PIC global values
lowerGlobalAddress added by D128427 can be used for PIC. The actual condition is
that the global value needs to be dso_local (a dso_preemptable one needs GOT
indirection).

load-store.ll has UB due to out-of-bounds load/store. Fix the UB in the variable
test and add an array test. Note: NOPIC array index is currently wrong.

Reviewed By: wangleiat

Differential Revision: https://reviews.llvm.org/D129977
2022-07-21 19:37:56 -07:00
Chenbing Zheng 1a0187c9e7 [InstCombine] remove useless ‘InstCombiner::’. nfc
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D130220
2022-07-22 09:24:24 +08:00
Phoebe Wang 02fe96b240 [X86][FP16] Do not split FP64->FP16 to FP64->FP32->FP16
Truncation from double to half is not always identical to truncating to float first and then to half. https://godbolt.org/z/56s9517hd

On the other hand, expanding to float and then to double is always identical to expanding to double directly. https://godbolt.org/z/Ye8vbYPnY

Reviewed By: RKSimon, skan

Differential Revision: https://reviews.llvm.org/D130151
2022-07-22 08:36:05 +08:00
Ilia Diachkov b8e1544b9d [SPIRV] add SPIRVPrepareFunctions pass and update other passes
The patch adds SPIRVPrepareFunctions pass, which modifies function
signatures containing aggregate arguments and/or return values before
IR translation. Information about the original signatures is stored in
metadata. It is used during call lowering to restore correct SPIR-V types
of function arguments and return values. This pass also substitutes some
llvm intrinsic calls to function calls, generating the necessary functions
in the module, as the SPIRV translator does.

The patch also includes changes in other modules, fixing errors and
enabling many SPIR-V features that were omitted earlier. And 15 LIT tests
are also added to demonstrate the new functionality.

Differential Revision: https://reviews.llvm.org/D129730

Co-authored-by: Aleksandr Bezzubikov <zuban32s@gmail.com>
Co-authored-by: Michal Paszkowski <michal.paszkowski@outlook.com>
Co-authored-by: Andrey Tretyakov <andrey1.tretyakov@intel.com>
Co-authored-by: Konrad Trifunovic <konrad.trifunovic@intel.com>
2022-07-22 04:00:48 +03:00
Philip Reames bd75350180 [LV] Fix a conceptual mistake around meaning of uniform in isPredicatedInst
This code confuses LV's "Uniform" and LVL/LAI's "Uniform".  Despite the
common name, these are different.
* LVs notion means that only the first lane *of each unrolled part* is
  required.  That is, lanes within a single unroll factor are considered
  uniform.  This allows e.g. widenable memory ops to be considered
  uses of uniform computations.
* LVL and LAI's notion refers to all lanes across all unrollings.

IsUniformMem is in turn defined in terms of LAI's notion.  Thus a
UniformMemOpmeans is a memory operation with a loop invariant address.
This means the same address is accessed in every iteration.

The tweaked piece of code was trying to match a uniform mem op (i.e.
fully loop invariant address), but instead checked for LV's notion of
uniformity.  In theory, this meant with UF > 1, we could speculate
a load which wasn't safe to execute.

This ends up being mostly silent in current code as it is nearly
impossible to create the case where this difference is visible.  The
closest I've come in the test case from 54cb87, but even then, the
incorrect result is only visible in the vplan debug output; before this
change we sink the unsafely speculated load back into the user's predicate
blocks before emitting IR.  Both before and after IR are correct so the
differences aren't "interesting".

The other test changes are uninteresting.  They're cases where LV's uniform
analysis is slightly weaker than SCEV isLoopInvariant.
2022-07-21 15:44:34 -07:00
Craig Topper ab2348a6fa [RISCV] Add sext.b/h and zext.b/h/w to RISCVInstrInfo::foldMemoryOperandImpl.
We can always fold zext.b since it is just andi. The others require
Zba/Zbb.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D130302
2022-07-21 14:54:58 -07:00
Alexander Shaposhnikov e9afdf838e [GlobalOpt] Enable evaluation of atomic loads
Relax the check to allow evaluation of atomic loads
(but still skip volatile loads).

Test plan:
1/ ninja check-llvm check-clang
2/ Bootstrapped LLVM/Clang pass tests

Differential revision: https://reviews.llvm.org/D130211
2022-07-21 21:36:11 +00:00
Daniel Thornburgh cc0a1078f5 Fix use after free in MarkupFilter.cpp 2022-07-21 13:52:24 -07:00
Teresa Johnson 1dad6247d2 [MemProf] Add memprof metadata related analysis utilities
Adds a number of utilities that are used to help create and update
memprof related metadata. These will be used during profile matching
and annotation, as well as by the inliner when updating the metadata.
Also adds unit tests for the utilities.

See also related RFCs:
RFC: Sanitizer-based Heap Profiler [1]
RFC: A binary serialization format for MemProf [2]
RFC: IR metadata format for MemProf [3]
(Note that the IR metadata format has changed from the RFC during
implementation, as described in the preceeding patch adding the basic
metadata and verification support.)

Depends on D128141.

Differential Revision: https://reviews.llvm.org/D128854
2022-07-21 13:46:01 -07:00
Martin Storsjö 606348cc72 [MinGW] Don't currently set visibility=hidden when building for MinGW
If we build the Target libraries with -fvisibility=hidden, then
LLVM_EXTERNAL_VISIBILITY must also be able to override it back
to default visibility.

Currently, the LLVM_EXTERNAL_VISIBILITY define is a no-op for
mingw targets, thus set CMAKE_CXX_VISIBILITY_PRESET correspondingly.

This unbreaks the mingw dylib build, if the compiler actually
takes hidden visiblity into account (e.g. after D130121).

(Later, once hidden visiblity can be used for MinGW targets, we
can make LLVM_EXTERNAL_VISIBILITY and LLVM_LIBRARY_VISIBILITY expand
to actual attributes, and reverse this commit.)

Differential Revision: https://reviews.llvm.org/D130200
2022-07-21 23:16:33 +03:00
Augie Fackler bd6aa67e02 BuildLibCalls: move inference of freeing memory later
This probably should have been part of D123089, but the effects of it
don't show up until we start removing functions from the table in
D130107. Oops.

Differential Revision: https://reviews.llvm.org/D130184
2022-07-21 15:31:16 -04:00
Augie Fackler 62f48cadfd MemoryBuiltins: accept non-TLI funcs with attribs as allocator funcs
This allows us to accept annotations from out-of-tree languages (the
example test is derived from Rust) so they can enjoy the benefits of
LLVM's optimizations without requiring LLVM to have language-specific
knowledge.

Differential Revision: https://reviews.llvm.org/D123091
2022-07-21 15:31:16 -04:00
Augie Fackler 5a3e3675f6 MemoryBuiltins: start using properties of functions
Prior to this change, we relied on the hard-coded list for all of the
information performed by MemoryBuiltins. With this change, we're able to
start relying on properites of functions described in attributes, which
opens the door to out-of-tree compilers being able to describe their
allocator functions to LLVM's optimizer logic without having to register
their implementation details with LLVM.

Differential Revision: https://reviews.llvm.org/D123090
2022-07-21 15:31:15 -04:00
Sanjay Patel 78c09f0f24 [PatternMatch][InstCombine] match a vector with constant expression element(s) as a constant expression
The InstCombine test is reduced from issue #56601. Without the more
liberal match for ConstantExpr, we try to rearrange constants in
Negator forever.

Alternatively, we could adjust the definition of m_ImmConstant to be
more conservative, but that's probably a larger patch, and I don't
see any downside to changing m_ConstantExpr. We never capture and
modify a ConstantExpr; transforms just want to avoid it.

Differential Revision: https://reviews.llvm.org/D130286
2022-07-21 15:23:57 -04:00
Arthur Eubanks 04d398db46 [LoopAccessAnalysis] Simplify D119047
No need to add checks for every type per pointer that we couldn't create
a check for the first time around, just the types that weren't
successful.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D119376
2022-07-21 12:16:02 -07:00
Daniel Thornburgh 6605187103 [NFC] Fix compiler warning in MarkupFilter 2022-07-21 12:00:29 -07:00
Daniel Thornburgh 17e4c217b6 [Symbolizer] Implement contextual symbolizer markup elements.
This change implements the contextual symbolizer markup elements: reset,
module, and mmap. These provide information about the runtime context of
the binary necessary to resolve addresses to symbolic values.

Summary information is printed to the output about this context.
Multiple mmap elements for the same module line are coalesced together.
The standard requires that such elements occur on their own lines to
allow for this; accordingly, anything after a contextual element on a
line is silently discarded.

Implementing this cleanly requires that the filter drive the parser;
this allows skipped sections to avoid being parsed. This also makes the
filter quite a bit easier to use, at the cost of some unused
flexibility.

Reviewed By: peter.smith

Differential Revision: https://reviews.llvm.org/D129519
2022-07-21 11:29:19 -07:00
Zequan Wu 4979b16db1 [llvm-cov] Improve error message by printing the object file name that produces error
If error occurs on constructing coverage info for one of the object files, it prints the name of the object file, so that users know which one is the cause of error.

Differential Revision: https://reviews.llvm.org/D130196
2022-07-21 11:26:51 -07:00
Pengxuan Zheng 53d7bf3052 [llvm-lib] Ignore /VERBOSE flag
Ignore the flag for now, but we can start using it for verbose output if needed.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D130202
2022-07-21 10:06:13 -07:00
David Sherwood f15b6b2907 [AArch64] Add target hook for preferPredicateOverEpilogue
This patch adds the AArch64 hook for preferPredicateOverEpilogue,
which currently returns true if SVE is enabled and one of the
following conditions (non-exhaustive) is met:

1. The "sve-tail-folding" option is set to "all", or
2. The "sve-tail-folding" option is set to "all+noreductions"
and the loop does not contain reductions,
3. The "sve-tail-folding" option is set to "all+norecurrences"
and the loop has no first-order recurrences.

Currently the default option is "disabled", but this will be
changed in a later patch.

I've added new tests to show the options behave as expected here:

  Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll

Differential Revision: https://reviews.llvm.org/D129560
2022-07-21 17:20:06 +01:00
Ivan Kosarev 4b9dbbdb09 [AMDGPU][MC][NFC] Refine SMEM load definitions.
Reviewed By: dp

Differential Revision: https://reviews.llvm.org/D130009
2022-07-21 14:56:56 +01:00
Ivan Kosarev 75950be836 [AMDGPU][NFC] Validate G_MERGE_VALUES as we match zero-extended 32-bit scalars.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D130001
2022-07-21 14:49:57 +01:00
Matt Arsenault 5a5439cb73 AMDGPU: Refine user-sgpr-init16-bug
It only applies to gfx1100 and gfx1102, and for wave32.
2022-07-21 08:57:00 -04:00
Nikita Popov 1f69503107 [MemoryBuiltins] Add getReallocatedOperand() function (NFC)
Replace the value-accepting isReallocLikeFn() overload with a
getReallocatedOperand() function, which returns which operand is
the one being reallocated. Currently, this is always the first one,
but once allockind(realloc) is respected, the reallocated operand
will be determined by the allocptr parameter attribute.
2022-07-21 14:54:16 +02:00
Nikita Popov 46e6dd84b7 [MemoryBuiltins] Remove isFreeCall() function (NFC)
Remove isFreeCall() in favor of getFreedOperand(). Replace the
two remaining uses with a getFreedOperand() != nullptr check, as
they only care that something is getting freed. (The usage in DSE
is correct as such. The allocator-related checks in CFLGraph look
rather questionable in general.)
2022-07-21 14:44:23 +02:00
Nikita Popov 5e856a8578 [InstCombine] Use getFreedOperand() (NFC)
Use getFreedOperand() instead of isFreeCall() to remove the
implicit assumption that any pointer operand to a free function
is the operand being freed. This won't actually matter until we
handle allockind(free).
2022-07-21 14:33:55 +02:00
Nikita Popov 3ac8587a2b [Attributor] Use getFreedOperand() (NFC)
Track which operand is actually freed, to avoid the implicit
assumption that it is the first call argument.
2022-07-21 14:26:47 +02:00
Thomas Symalla fd64a857ee [AMDGPU] Combine s_or_saveexec, s_xor instructions.
This patch merges a consecutive sequence of

s_or_saveexec s_o, s_i
s_xor exec, exec, s_o

into a single

s_andn2_saveexec s_o, s_i instruction.
This patch also cleans up the SIOptimizeExecMasking pass a bit.

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D129073
2022-07-21 14:16:37 +02:00
Alexey Lapshin 8bb4451a65 [Reland][DebugInfo][llvm-dwarfutil] Combine overlapped address ranges.
DWARF files may contain overlapping address ranges. f.e. it can happen if the two
copies of the function have identical instruction sequences and they end up sharing.
That looks incorrect from the point of view of DWARF spec. Current implementation
of DWARFLinker does not combine overlapped address ranges. It would be good if such
ranges would be handled in some useful way. Thus, this patch allows DWARFLinker
to combine overlapped ranges in a single one.

Depends on D86539

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D123469
2022-07-21 14:15:39 +03:00
Matt Devereau cd3d7bf15d [AArch64][SVE] Add DAG-Combine to push bitcasts from floating point loads after DUPLANE128
This patch lowers
  duplane128(insert_subvector(undef, bitcast(op(128bitsubvec)), 0), 0)
to
  bitcast(duplane128(insert_subvector(undef, op(128bitsubvec), 0), 0)).

This enables floating-point loads to match patterns added in
https://reviews.llvm.org/D130010

Differential Revision: https://reviews.llvm.org/D130013
2022-07-21 11:00:10 +00:00
Matt Devereau e0fbd990c9 [AArch64][SVE] Add ISel pattern to lower DUPLANE128 to LD1RQD
Following on from https://reviews.llvm.org/D128902, lower DUPLANE128 to LD1RQD
for integer load types from instruction selection.

Differential Revision: https://reviews.llvm.org/D130010
2022-07-21 10:56:43 +00:00
Jay Foad 9383b09858 [AMDGPU][GlobalISel] Fix subtarget checks for combining to v_med3_i16
Differential Revision: https://reviews.llvm.org/D130243
2022-07-21 11:41:31 +01:00
Alexey Lapshin 3aad49082c Revert "[DebugInfo][llvm-dwarfutil] Combine overlapped address ranges."
This reverts commit d2a4d6bf9c.
2022-07-21 13:40:20 +03:00
Nikita Popov c81dff3c30 [MemoryBuiltins] Add getFreedOperand() function (NFCI)
We currently assume in a number of places that free-like functions
free their first argument. This is true for all hardcoded free-like
functions, but with the new attribute-based design, the freed
argument is supposed to be indicated by the allocptr attribute.

To make sure we handle this correctly once allockind(free) is
respected, add a getFreedOperand() helper which returns the freed
argument, rather than just indicating whether the call frees *some*
argument.

This migrates most but not all users of isFreeCall() to the new
API. The remaining users are a bit more tricky.
2022-07-21 12:39:35 +02:00
Alexey Lapshin d2a4d6bf9c [DebugInfo][llvm-dwarfutil] Combine overlapped address ranges.
DWARF files may contain overlapping address ranges. f.e. it can happen if the two
copies of the function have identical instruction sequences and they end up sharing.
That looks incorrect from the point of view of DWARF spec. Current implementation
of DWARFLinker does not combine overlapped address ranges. It would be good if such
ranges would be handled in some useful way. Thus, this patch allows DWARFLinker
to combine overlapped ranges in a single one.

Depends on D86539

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D123469
2022-07-21 13:15:18 +03:00
Nikita Popov 8d58c8e57b Reapply [InstCombine] Don't check for alloc fn before fetching alloc size
Reapply the patch with getObjectSize() replaced by getAllocSize().
The former will also look through calls that return their argument,
and we'll end up placing dereferenceable attributes on intrinsics
like llvm.launder.invariant.group. While this isn't wrong, it also
doesn't seem to be particularly useful. For now, use getAllocSize()
instead, which sticks closer to the original behavior of this code.

-----

This code is just interested in the allocsize, not any other
allocator properties.
2022-07-21 11:48:24 +02:00
Nikita Popov d144ae6e1b [MemoryBuiltins] Default to trivial mapper in getAllocSize() (NFC)
Default getAllocSize() to use the trivial mapper. Also switch
from using std::function to function_ref.

Furthermore, update the doc comment to point out a subtle difference
between getAllocSize() and getObjectSize(): The latter may also
return something for calls that return their argument (via "returned"
attribute or special intrinsics like invariant groups).
2022-07-21 11:43:48 +02:00
jacquesguan e60eb7053d recommit "[DAGCombiner] Teach scalarizeBinOpOfSplats handle scalable splat."
With fix for AArch64 and Hexgon test cases.
2022-07-21 17:34:34 +08:00
Nikita Popov 235fb602ed [MemoryBuiltins] Don't query TLI for non-pointer functions (NFC)
Fetching allocation data for calls is a rather hot operation, and
TLI lookups are slow. We can greatly reduce the number of calls
for which TLI is queried by checking that they return a pointer
value first, as this is a requirement for allocation functions
anyway.
2022-07-21 11:28:36 +02:00
Nikita Popov 70056d04e2 Revert "[InstCombine] Don't check for alloc fn before fetching object size"
This reverts commit c72c22c04d.

This affected an Analysis test that I missed. Reverting for now.
2022-07-21 10:59:12 +02:00
Nikita Popov c72c22c04d [InstCombine] Don't check for alloc fn before fetching object size
This code is just interested in the allocsize, not any other
allocator properties.
2022-07-21 10:45:03 +02:00
Zi Xuan Wu (Zeson) 08db089124 [CSKY] Fix the testcase error due to the verifyInstructionPredicates
- Test cases for arch only has 16-bit instruction such as ck801/ck802 need
compile with -mattr=+btst16
- Fix the GPR copy instruction with MOV16 for 16-bit only arch.
2022-07-21 15:53:50 +08:00
Nikita Popov f45ab43332 [MemoryBuiltins] Avoid isAllocationFn() call before checking removable alloc
Alloc directly checking whether a given call is a removable
allocation, instead of first checking whether it is an allocation
first.
2022-07-21 09:39:19 +02:00
David Green 23d6186be0 [SelectionDAG] Fix fptoi.sat scalable vector lowering
Vector fptosi_sat and fptoui_sat were being expanded by unrolling the
vector operation. This doesn't work for scalable vector, so this patch
adds a call to TLI.expandFP_TO_INT_SAT if the vector is scalable.

Scalable tests are added for AArch64 and RISCV. Some of the AArch64
fptoi_sat operations should be legal, but that will be handled in
another patch.

Differential Revision: https://reviews.llvm.org/D130028
2022-07-21 08:00:22 +01:00
Congzhe Cao 05ccde8023 [LoopCacheAnalysis] Fix a type mismatch problem in cost calculation
There is a problem in loop cache analysis that the types of SCEV variables
`Coeff` and `ElemSize` in function `isConsecutive()` may not match. The
mismatch would cause SCEV failures when `Coeff` is multiplied with `ElemSize`.

The fix in this patch is to extend the type of both `Coeff` and `ElemSize` to
whichever is wider in those two variables. As a clean-up, duplicate calculations
of `Stride` in `computeRefCost()` is then removed.

Reviewed By: Meinersbur, #loopoptwg

Differential Revision: https://reviews.llvm.org/D128877
2022-07-21 01:57:05 -04:00
Craig Topper add17fc8e4 [RISCV] Combine (select_cc (srl (and X, 1<<C), C), 0, eq/ne, true, fale)
(srl (and X, 1<<C), C) is the form we receive for testing bit C.
An earlier combine removed the setcc so it wasn't there to match
when we created the SELECT_CC. This doesn't happen for BR_CC because
generic DAG combine rebuilds the setcc if it is used by BRCOND.

We can shift X left by XLen-1-C to put the bit to be tested in the
MSB, and use a signed compare with 0 to test the MSB.
2022-07-20 22:32:11 -07:00
esmeyi 339392ecf2 [AIX] follow-up of D124654.
Emitting the remaining aliases instead of reporting
an error to avoid SPEC2017 PEAK failures.
And mark this as a TODO.
2022-07-21 01:10:09 -04:00
Craig Topper 7dda6c71b1 [RISCV] Refactor the common combines for SELECT_CC and BR_CC into a helper function.
The only difference between the combines were the calls to getNode
that include the true/false values for SELECT_CC or the chain
and branch target for BR_CC.

Wrap the rest of the code into a helper that reads LHS, RHS, and
CC and outputs new values and a bool if a new node needs to be
created.
2022-07-20 21:18:07 -07:00
Chenbing Zheng 8c124c9088 [InstCombine] (ShiftValC >> Y) >s -1/<s 0 --> Y != 0/==0
We can do folds (ShiftValC >> Y) >s -1 --> Y != 0 and
(ShiftValC >> Y) <s 0 --> Y == 0, with ShiftValC < 0.

Alive2: https://alive2.llvm.org/ce/z/-PRHfD

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D129726
2022-07-21 10:12:29 +08:00
Chenbing Zheng 8075f680c8 [InstCombine] add fold (X > C - 1) ^ (X < C + 1) --> X != C
Considering the correctness of this pattern, we should avoid that C - 1
is non-negative and C + 1 is negative.

Alive2: https://alive2.llvm.org/ce/z/c_rBaq

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D129622
2022-07-21 10:08:21 +08:00
Craig Topper 8983db15a3 [RISCV] Optimize (brcond (seteq (and X, 1 << C), 0))
If C > 10, this will require a constant to be materialized for the
And. To avoid this, we can shift X left by XLen-1-C bits to put the
tested bit in the MSB, then we can do a signed compare with 0 to
determine if the MSB is 0 or 1. Thanks to @reames for the suggestion.

I've implemented this inside of translateSetCCForBranch which is
called when setcc+brcond or setcc+select is converted to br_cc or
select_cc during lowering. It doesn't make sense to do this for
general setcc since we lack a sgez instruction.

I've tested bit 10, 11, 31, 32, 63 and a couple bits betwen 11 and 31
and between 32 and 63 for both i32 and i64 where applicable. Select
has some deficiencies where we receive (and (srl X, C), 1) instead.
This doesn't happen for br_cc due to the call to rebuildSetCC in the
generic DAGCombiner for brcond. I'll explore improving select in a
future patch.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D130203
2022-07-20 18:40:49 -07:00
Anubhab Ghosh 4fcf8434dd [ORC] Add a new MemoryMapper-based JITLinkMemoryManager implementation.
MapperJITLinkMemoryManager supports executor memory management using any
implementation of MemoryMapper to do the transfer such as InProcessMapper or
SharedMemoryMapper.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D129495
2022-07-20 17:52:37 -07:00
Lang Hames aabc4b13e8 [ORC] Don't try to copy from an empty segment in SimpleExecutorMemoryManager.
Since 67220c2ad7 empty SPSSequence<char>s deserialize to default-constructed
ArrayRef<char>s, which have a null data field. We need to check for this to
avoid memcpy'ing from a nullptr.

This should fix the bot failure in
https://lab.llvm.org/buildbot/#/builders/85/builds/9323
2022-07-20 16:47:00 -07:00
Johannes Doerfert ad98ef8be4 [Attributor] Deal with complex PHI nodes better during AAPointerInfo
We were quite conservative when it came to PHI node handling to avoid
recursive reasoning. Now we check more direct if we have seen a PHI
already or not. This allows non-recursive PHI chains to be handled.

This also exposed a bug as we did only model the effect of one loop
traversal. `phi_no_store_3` has been adapted to show how we would have
used `undef` instead of `1` before. With this patch we don't replace
it at all, which is expected as we do not argue about loop iterations
(or alignments).
2022-07-20 17:34:50 -05:00
Johannes Doerfert 142897dd7d [Attributor] Only non-exact accesses require a uniform bit-pattern (=0)
If we only have exact accesses we should never require the bit-pattern
to be uniform (in this case 0). Only a non-exact access should force us
to require only 0 values.
2022-07-20 17:34:50 -05:00
Alexander Shaposhnikov 67f1fe8597 [GlobalOpt] Enable evaluation of atomic stores
Relax the check to allow evaluation of atomic stores
(but still skip volatile stores).

Test plan:
1/ ninja check-llvm check-clang
2/ Bootstrapped LLVM/Clang pass tests

Differential revision: https://reviews.llvm.org/D129841
2022-07-20 22:33:58 +00:00
Teresa Johnson 0174f5553e [MemProf] Basic metadata support and verification
Add basic support for the MemProf metadata (!memprof and !callsite)
which was initially described in "RFC: IR metadata format for MemProf"
(https://discourse.llvm.org/t/rfc-ir-metadata-format-for-memprof/59165).

The bulk of the patch is verification support, along with some tests.

There are a couple of changes to the format described in the original
RFC:

Initial measurements suggested that a tree format for the stack ids in
the contexts would be more efficient, but subsequent evaluation with
large applications showed that in fact the cost of the additional
metadata nodes required by this deduplication scheme overwhelmed the
benefit from sharing stack id nodes. Therefore, the implementation here
and in follow on patches utilizes a simpler scheme of lists of stack id
integers in the memprof profile contexts and callsite metadata. The
follow on matching patch employs context trimming optimizations to
reduce the cost.

Secondly, instead of verbosely listing all profiled fields in each
profiled context (memory info block or MIB), and deferring the
interpretation of the profile data, the profile data is evaluated and
converted into string tags specifying the behavior (e.g. "cold") during
profile matching. This reduces the verbosity of the profile metadata,
and allows additional context trimming optimizations. As a result, the
named metadata schema description is also no longer needed.

Differential Revision: https://reviews.llvm.org/D128141
2022-07-20 15:30:55 -07:00
Schrodinger ZHU Yifan 304027206c [ThinLTO] Support aliased GlobalIFunc
Fixes https://github.com/llvm/llvm-project/issues/56290: when an ifunc is
aliased in LTO, clang will attempt to create an alias summary; however, as ifunc
is not included in the module summary, doing so will lead to crash.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D129009
2022-07-20 15:30:38 -07:00
Craig Topper 31b8939ded [RISCV] Recognize bexti from (srl (and X, 1<<C), C).
This is the form we get for (zext (setne (and X 1<<C))). We only
had bexti patterns for the alternative form (and (srl X, C), 1).
2022-07-20 15:03:52 -07:00
Arthur Eubanks bc9b964f8f [NFC] Suppress unused variable warning in non-assert builds 2022-07-20 12:26:16 -07:00
Philip Reames f494f89b2a [LAA] Fix latent missing check bug when mixing scalable and non-scalabe strides
Noticed via inspection; to my knowledge, impossible to hit today.  In theory, we could have a fixed stride check be analyzed, then a scalable one.  With the old code, the scalable one would be silently dropped, and the runtime guard would go ahead with only the fixed one.  This would be a miscompile.
2022-07-20 11:56:45 -07:00
Craig Topper d76c8f5127 [InstCombine] Add mul with negated power of 2 constant to canEvaluateShifted.
If we are right shifting a multiply by a negated power of 2 where
the power of 2 is the same as the shift amount, we can replace with
a negate followed by an And.

New tests have not been committed yet but the patch shows the diffs.
Let me know if you want any changes or additional tests.

Differential Revision: https://reviews.llvm.org/D130103
2022-07-20 11:00:22 -07:00
Joe Nash dc850fbf3b [AMDGPU] NFC. Assert that mask is full with VOPC DPP
VOPC DPP should not be formed when the row_mask and bank_mask are not
0xf (full) because the resulting VOP DPP would have different semantics
than the MOV DPP followed by VOP. Existing checks in GCNDPPCombine cover
this case but for different reasons, so assert the property for
future-proofing.

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D130101
2022-07-20 13:23:03 -04:00
Ruobing Han 2b98b8e8fb fix bug for useless malloc elimination in CodeGenPrepare
Put AllocationFn check before I->willReturn can allow CodeGenPrepare to remove useless malloc instruction

Differential Revision: https://reviews.llvm.org/D130126
2022-07-20 16:29:51 +00:00
Kazu Hirata 360c1111e3 Use llvm::is_contained (NFC) 2022-07-20 09:09:19 -07:00
Roman Rusyaev 394a388d14 [TableGen] Add a location for a class definition that was forward-declared
This change improves ctags generation for tablegen files.

For the following example
```
class A;

class A {
  int a;
}
```
Previously, tags were generated only for a forward declaration of class 'A'.

This patch allows generating tags for the forward declarations
and further definition of class 'A'.

Reviewed By: barannikov88

Original patch by: rusyaev-roman (Roman Rusyaev)
Some adjustments by: nhaehnle (Nicolai Hähnle)

Differential Revision: https://reviews.llvm.org/D129935
2022-07-20 15:56:17 +02:00
Philip Reames 523a526a02 [LV] Fix miscompile due to srem/sdiv speculation safety condition
An srem or sdiv has two cases which can cause undefined behavior, not just one. The existing code did not account for this, and as a result, we miscompiled when we encountered e.g. a srem i64 %v, -1 in a conditional block.

Instead of hand rolling the logic, just use the utility function which exists exactly for this purpose.

Differential Revision: https://reviews.llvm.org/D130106
2022-07-20 05:35:23 -07:00
Nicolai Hähnle 1ddc51d89d Inliner: don't mark call sites as 'nounwind' if that would be redundant
When F calls G calls H, G is nounwind, and G is inlined into F, then the
inlined call-site to H should be effectively nounwind so as not to lose
information during inlining.

If H itself is nounwind (which often happens when H is an intrinsic), we
no longer mark the callsite explicitly as nounwind. Previously, there
were cases where the inlined call-site of H differs from a pre-existing
call-site of H in F *only* in the explicitly added nounwind attribute,
thus preventing common subexpression elimination.

v2:
- just check CI->doesNotThrow

v3 (resubmit after revert at 3443788087):
- update Clang tests

Differential Revision: https://reviews.llvm.org/D129860
2022-07-20 14:17:23 +02:00
Max Kazantsev e0ccd190ae [SCEV][NFC][CT] Do not waste time proving contextual facts for unreached loops and blocks
In fact, in unreached code we can say that every fact is true. So do not waste time trying to
do something smarter.

Formally it's not an NFC because it may change query results in unreached code, but they
won't have any impact on execution.

Hypothetical CT boost expected but not measured in practice.

Differential Revision: https://reviews.llvm.org/D129878
2022-07-20 19:02:28 +07:00
esmeyi b1847ff068 [XCOFF] write the aux header when the visibility is specified in XCOFF32.
The n_type field in the symbol table entry has two interpretations in XCOFF32, and a single interpretation in XCOFF64.
The new interpretation is used in XCOFF32 if the value of the o_vstamp field in the auxiliary header is 2.
In XCOFF64 and the new XCOFF32 interpretation, the n_type field is used for the symbol type and visibility.
The patch writes the aux header with an o_vstamp field value of 2 when the visibility is specified in XCOFF32 to make the new XCOFF32 interpretation used.

Reviewed By: DiggerLin, jhenderson

Differential Revision: https://reviews.llvm.org/D128148
2022-07-20 07:09:34 -04:00
Simon Pilgrim 029e83b401 [DAG] getNode - don't bother creating ADDO(X,0) or SUBO(X,0) nodes.
Similar to what we already do in getNode for basic ADD/SUB nodes, return the X operand directly, but here we know that there will be no/zero overflow as well.

As noted on D127115 - this path is being exercised by llvm/test/CodeGen/ARM/dsp-mlal.ll, although I haven't been able to get any codegen without a topological worklist.
2022-07-20 12:04:33 +01:00
David Green 4704da1374 [ARM] Fix Thumb2 compare being emitted ExpandCMP_SWAP
Given a patch like D129506, using instructions not valid for the current
target feature set becomes an error. This fixes an issue in
ARMExpandPseudo::ExpandCMP_SWAP where Thumb2 compares were used in
Thumb1Only code, such as thumbv8m.baseline targets.

Differential Revision: https://reviews.llvm.org/D129695
2022-07-20 12:04:22 +01:00
Simon Pilgrim 766cd95481 [DAG] getNode - assert that ADDO/SUBO nodes have the correct ops + types 2022-07-20 11:23:58 +01:00
Florian Hahn 5124b21648
[VPlan] Initial def-use verification.
This patch introduces some initial def-use verification. This catches
cases like the one fixed by D129436.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D129717
2022-07-20 11:06:32 +01:00
Simon Pilgrim 9fc347aa4e [DAG] PromoteIntRes_BUILD_VECTOR - extend constant boolean vectors according to target BooleanContents
PromoteIntRes_BUILD_VECTOR currently always ANY_EXTENDs build vector operands, but if this is a constant boolean vector we're losing the useful ability to keep the vector matching the BooleanContents mode used by the target.

This patch extends constant boolean vectors according to target BooleanContents, allowing a number of additional all-bits folds (notable XOR -> NOT conversions) to occur.

Differential Revision: https://reviews.llvm.org/D129641
2022-07-20 10:49:31 +01:00
Chuanqi Xu 645d2dd3a9 Revert "Don't treat readnone call in presplit coroutine as not access memory"
This reverts commit 57224ff4a6. This
commit may trigger crashes on some workloads. Revert it for clearness.
2022-07-20 17:00:58 +08:00
Alexandros Lamprineas 051738b08c Reland "[AArch64] Add a tablegen pattern for UZP2."
Converts concat_vectors((trunc (lshr)), (trunc (lshr))) to UZP2
when the shift amount is half the width of the vector element.

Prioritize the ADDHN(2), SUBHN(2) patterns over UZP2.
Fixes https://github.com/llvm/llvm-project/issues/52919

Differential Revision: https://reviews.llvm.org/D130061
2022-07-20 09:47:32 +01:00
David Sherwood 79660d339e [LoopVectorize][AArch64] Add TTI hook preferPredicatedReductionSelect
By default if SVE is enabled we want the select instruction used for
reductions to be inside the loop, rather than outside. This makes it
possible for the backend to fold the select into the operation to
produce a single predicated add, fadd, etc.

Differential Revision: https://reviews.llvm.org/D129763
2022-07-20 09:33:29 +01:00
Lorenzo Albano 07d69d9fc9 [VP] Legalize the stride operand for EXPERIMENTAL_VP_STRIDED SDNodes
Add promotion and expansion of integer operands for
experimental_vp_strided SelectionDAG nodes; the expansion is actually
just a truncation of the stride operand.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D123112
2022-07-20 10:22:43 +02:00
Kazu Hirata 76e18cc4f6 [llvm] Use llvm::any_of and llvm::none_of (NFC) 2022-07-20 00:36:19 -07:00
Fangrui Song e931c2e870 [LegacyPM] Remove InstrOrderFileLegacyPass
Following recent changes removing non-core features of the legacy
PM/optimization pipeline.
2022-07-19 23:58:51 -07:00
Kazu Hirata 0387da6f4f Use value instead of getValue (NFC) 2022-07-19 21:18:26 -07:00
Haohai Wen d946fb8d95 [X86] Make sure load size is not larger than stack slot
Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D130084
2022-07-20 12:17:44 +08:00
chenglin.bi d337c1f256 [AArch64] Use SUBXrx64 for dynamic stack to refer to sp
When we lower dynamic stack, we need to substract pattern `x15 << 4`  from sp.
Subtract instruction with arith shifted register(SUBXrs) can't refer to sp. So for now we need two extra mov like:

```
mov x0, sp
sub x0, x0, x15, lsl #4
mov sp, x0
```
If we want to refer to sp in subtract instruction like this:
```
sub	sp, sp, x15, lsl #4
```
We must use arith extended register version(SUBXrx).
So in this patch when we find sub have sp operand on src0, try to select to SubXrx64.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D129932
2022-07-20 11:46:10 +08:00
Kazu Hirata 41ae78ea3a Use has_value instead of hasValue (NFC) 2022-07-19 20:15:44 -07:00
Kazu Hirata bbbb4393ee [CodeGen] Use value_or instead of getValueOr (NFC) 2022-07-19 19:50:43 -07:00
Chuanqi Xu 57224ff4a6 Don't treat readnone call in presplit coroutine as not access memory
To solve the readnone problems in coroutines. See
https://discourse.llvm.org/t/address-thread-identification-problems-with-coroutine/62015
for details.

According to the discussion, we decide to fix the problem by inserting
isPresplitCoroutine() checks in different passes instead of
wrapping/unwrapping readnone attributes in CoroEarly/CoroCleanup passes.
In this direction, we might not be able to cover every case at first.
Let's take a "find and fix" strategy.

Reviewed By: nikic, nhaehnle, jyknight

Differential Revision: https://reviews.llvm.org/D127383
2022-07-20 10:37:23 +08:00
Sanjay Patel f0dd12ec5c [x86] use zero-extending load of a byte outside of loops too (2nd try)
The first attempt missed changing test files for tools
(update_llc_test_checks.py).

Original commit message:

This implements the main suggested change from issue #56498.
Using the shorter (non-extending) instruction with only
-Oz ("minsize") rather than -Os ("optsize") is left as a
possible follow-up.

As noted in the bug report, the zero-extending load may have
shorter latency/better throughput across a wide range of x86
micro-arches, and it avoids a potential false dependency.
The cost is an extra instruction byte.

This could cause perf ups and downs from secondary effects,
but I don't think it is possible to account for those in
advance, and that will likely also depend on exact micro-arch.
This does bring LLVM x86 codegen more in line with existing
gcc codegen, so if problems are exposed they are more likely
to occur for both compilers.

Differential Revision: https://reviews.llvm.org/D129775
2022-07-19 21:27:08 -04:00
Jez Ng 2e2737cdf9 [MC][MachO] Change addrsig format + ensure its size is properly set
There were two problems with the previous setup:

1. We weren't setting its size, which caused problems when `__llvm_addrsig`
   wasn't the last section. In particular, `__debug_line` (if created) is
   generated and placed after `__llvm_addrsig`, and would result in an
   invalid object file w/ overlapping sections being emitted.

2. The symbol indices could be invalidated if e.g. `llvm-strip` ran on
   the object file. See discussion [here][1].

To fix both these issues, we use symbol relocations instead of encoding
symbol indices directly in the section contents. The section itself
doesn't contain any data. That sidesteps the layout problem in addition
to solving the second issue.

The corresponding LLD change to read in this new format: {D128938}.
It will fix the icf-safe.ll test failure on this diff.

[1]: https://discourse.llvm.org/t/problems-with-mach-o-address-significance-table-generation/63392/

Reviewed By: #lld-macho, alx32

Differential Revision: https://reviews.llvm.org/D127637
2022-07-19 21:22:23 -04:00
Johannes Doerfert f84712f0b8 [Attributor] Teach checkForAllUses to follow returns into callers
If we can determine all call sites we can follow a use in a return
instruction into the caller. AAPointerInfo utilizes this feature.
2022-07-19 18:17:40 -05:00
Johannes Doerfert 4f2ccdd0b1 [Attributor][NFC] Improve debug messages 2022-07-19 18:17:40 -05:00
Anubhab Ghosh 1b1f1c7786 Re-re-apply 5acd471698, Add a shared-memory based orc::MemoryMapper...
...with more fixes.

The original patch was reverted in 3e9cc543f2 due to bot failures caused by
a missing dependence on librt. That issue was fixed in 32d8d23cd0, but that
commit also broke sanitizer bots due to a bug in SimplePackedSerialization:
empty ArrayRef<char>s triggered a zero-byte memcpy from a null source. The
ArrayRef<char> serialization issue was fixed in 67220c2ad7, and this patch has
also been updated with a new custom SharedMemorySegFinalizeRequest message that
should avoid serializing empty ArrayRefs in the first place.

https://reviews.llvm.org/D128544
2022-07-19 15:35:33 -07:00
Nick Desaulniers 1cf6b93df1 Revert "[Local] Allow creating callbr with duplicate successors"
This reverts commit 08860f525a.

Crashes during PPC64LE linux kernel builds as reported by @nathanchance.
https://reviews.llvm.org/D129997#3663632
2022-07-19 15:03:27 -07:00
Lang Hames 6d8438314f [JITLink] Hook up prebuilt cache in DWARFRecordSectionSplitter::processBlock.
DWARFRecordSectionSplitter pre-builds a splitBlock cache, but wasn't passing it
to the call to splitBlock. This was an oversight in the original patch.
2022-07-19 15:03:14 -07:00
Sanjay Patel 95401b0153 Revert "[x86] use zero-extending load of a byte outside of loops too"
This reverts commit 9d1ea1774c.
There are tests of update_llc_tests_checks.py that missed being updated.
2022-07-19 17:37:22 -04:00
Johannes Doerfert bf789b1957 [Attributor] Replace AAValueSimplify with AAPotentialValues
For the longest time we used `AAValueSimplify` and
`genericValueTraversal` to determine "potential values". This was
problematic for many reasons:
- We recomputed the result a lot as there was no caching for the 9
  locations calling `genericValueTraversal`.
- We added the idea of "intra" vs. "inter" procedural simplification
  only as an afterthought. `genericValueTraversal` did offer an option
  but `AAValueSimplify` did not. Thus, we might end up with "too much"
  simplification in certain situations and then gave up on it.
- Because `genericValueTraversal` was not a real `AA` we ended up with
  problems like the infinite recursion bug (#54981) as well as code
  duplication.

This patch introduces `AAPotentialValues` and replaces the
`AAValueSimplify` uses with it. `genericValueTraversal` is folded into
`AAPotentialValues` as are the instruction simplifications performed in
`AAValueSimplify` before. We further distinguish "intra" and "inter"
procedural simplification now.

`AAValueSimplify` was not deleted as we haven't ported the
re-materialization of instructions yet. There are other differences over
the former handling, e.g., we may not fold trivially foldable
instructions right now, e.g., `add i32 1, 1` is not folded to `i32 2`
but if an operand would be simplified to `i32 1` we would fold it still.

We are also even more aware of function/SCC boundaries in CGSCC passes,
which is good even if some tests look like they regress.

Fixes: https://github.com/llvm/llvm-project/issues/54981

Note: A previous version was flawed and consequently reverted in
      6555558a80.
2022-07-19 16:24:42 -05:00
Sanjay Patel 9d1ea1774c [x86] use zero-extending load of a byte outside of loops too
This implements the main suggested change from issue #56498.
Using the shorter (non-extending) instruction with only
-Oz ("minsize") rather than -Os ("optsize") is left as a
possible follow-up.

As noted in the bug report, the zero-extending load may have
shorter latency/better throughput across a wide range of x86
micro-arches, and it avoids a potential false dependency.
The cost is an extra instruction byte.

This could cause perf ups and downs from secondary effects,
but I don't think it is possible to account for those in
advance, and that will likely also depend on exact micro-arch.
This does bring LLVM x86 codegen more in line with existing
gcc codegen, so if problems are exposed they are more likely
to occur for both compilers.

Differential Revision: https://reviews.llvm.org/D129775
2022-07-19 16:43:47 -04:00
Yusra Syeda 6fb27bc2e3 [SystemZ][z/OS] Introduce CCAssignToRegAndStack to calling convention
Differential Revision: https://reviews.llvm.org/D127328
2022-07-19 13:55:25 -04:00
Cole Kissane e939bf67e3 [llvm] add zstd to `llvm::compression` namespace
- add zstd to `llvm::compression` namespace
- add a CMake option `LLVM_ENABLE_ZSTD` with behavior mirroring that of `LLVM_ENABLE_ZLIB`
- add tests for zstd to `llvm/unittests/Support/CompressionTest.cpp`
- debian users should install libzstd when using `LLVM_ENABLE_ZSTD=FORCE_ON` from source due to this bug https://bugs.launchpad.net/ubuntu/+source/libzstd/+bug/1941956

Reviewed By: leonardchan, MaskRay

Differential Revision: https://reviews.llvm.org/D128465
2022-07-19 10:54:36 -07:00
Jon Chesterfield 3a20597776 [amdgpu] Implement lds kernel id intrinsic
Implement an intrinsic for use lowering LDS variables to different
addresses from different kernels. This will allow kernels that cannot
reach an LDS variable to avoid wasting space for it.

There are a number of implicit arguments accessed by intrinsic already
so this implementation closely follows the existing handling. It is slightly
novel in that this SGPR is written by the kernel prologue.

It is necessary in the general case to put variables at different addresses
such that they can be compactly allocated and thus necessary for an
indirect function call to have some means of determining where a
given variable was allocated. Claiming an arbitrary SGPR into which
an integer can be written by the kernel, in this implementation based
on metadata associated with that kernel, which is then passed on to
indirect call sites is sufficient to determine the variable address.

The intent is to emit a __const array of LDS addresses and index into it.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D125060
2022-07-19 17:46:19 +01:00
Arthur Eubanks 13aa2c1c3b [DSE] Revisit pointers that may no longer escape after removing another store
In dependent-capture, previously we'd see that %tmp4 is captured due to
the first store. We'd cache this info in CapturedBeforeReturn and
InvisibleToCallerAfterRet. Then the first store is then removed, causing
the cached values to be wrong.

We also need to revisit everything because normally we work backwards
when removing stores at the end of the function, but in this case
removing an earlier store causes a later store to be removable.

No compile time impact:
https://llvm-compile-time-tracker.com/compare.php?from=56796ae1a8db4c85dada28676f8303a5a3609c63&to=21b7e5248ffc423cd36c9d4a020085e363451465&stat=instructions

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D123686
2022-07-19 09:30:34 -07:00
Sanjay Patel 3d6c10dcf3 [SimplifyLibCalls] avoid converting pow() to powi() with no FMF
powi() is not a standard math library function; it is specified
with non-strict semantics in the LangRef. We currently require
'afn' to do this transform when it needs a sqrt(), so I just
extended that requirement to the whole-number exponent too.

This bug was introduced with:
b17754bcaa
...where we deferred expansion of pow() to later passes.
2022-07-19 12:26:53 -04:00
Jon Chesterfield 2224bbcd74 [nfc][amdgpu] LDS. Move selection logic up the stack. 2022-07-19 17:20:19 +01:00
David Truby 4c82f56d8f [llvm][SVE] Remove redundant and when comparing against extending load
When determining if an `and` should be merged into an extending load
the constant argument to the `and` is currently not checked if the
argument requires truncation. This prevents the combine happening when
the vector width is half the normal available vector width for SVE VLA
vectors.

Reviewed By: c-rhodes

Differential Revision: https://reviews.llvm.org/D129281
2022-07-19 17:08:32 +01:00
Arthur Eubanks 8c6305b8b4 [NewPM] Print function/SCC size with -debug-pass-manager
This is helpful for debugging issues with very large functions or SCC.
Also helpful when function names are very large and it's hard to tell the number of nodes in an SCC.

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D128003
2022-07-19 09:00:37 -07:00
Arnold Schwaighofer bc4870f09e [coro async] Add missing llvm.coro.id.async intrinsic to declaresCoroCleanupIntrinsics
rdar://97214593

Differential Revision: https://reviews.llvm.org/D130038
2022-07-19 07:25:04 -07:00
Joe Nash b28bb8cc9c [AMDGPU] Remove old operand from VOPC DPP
For most DPP instructions, the old operand stores the value that was in
the current lane before the DPP operation, and is tied to the
destination. For VOPC DPP, this is unnecessary and incorrect.

There appears to have been a latent bug related to D122737 with
SIInstrInfo::isOperandLegal. If you checked if a register operand was legal
when the InstructionDesc expected an immediate, it reported that is valid.
Its fix is necessary for and tested in this patch.

Reviewed By: foad, rampitec

Differential Revision: https://reviews.llvm.org/D130040
2022-07-19 09:35:05 -04:00
Andrew Turner b850762b62 Add the FreeBSD AArch64 memory layout
Use the FreeBSD AArch64 memory layout values when building for it.
These are based on the x86_64 values, scaled to take into account the
larger address space on AArch64.

Reviewed by: vitalybuka

Differential Revision: https://reviews.llvm.org/D125883
2022-07-19 09:58:07 -04:00
Andrew Turner e13bd2644e Add the FreeBSD AArch64 shadow offset to llvm
AArch64 has a larger address space than 64 but x86. Use the larger
shadow offset on FreeBSD AArch64.

Reviewed by: vitalybuka

Differential Revision: https://reviews.llvm.org/D125873
2022-07-19 09:58:07 -04:00
Simon Pilgrim 71c502cbca [DAG] Call SimplifyDemandedBits from ISD::MUL nodes
Noticed while triaging D129765.
2022-07-19 14:11:04 +01:00
William Schmidt bccc9aa81c Don't vectorize PHIs in catchswitch blocks
We currently assert in vectorizeTree(TreeEntry*) when processing a PHI
bundle in a block containing a catchswitch.  We attempt to set the
IRBuilder insertion point following the catchswitch, which is invalid.
This is done so that ShuffleBuilder.finalize() knows where to insert
a shuffle if one is needed.

To avoid this occurring, watch out for catchswitch blocks during
buildTree_rec() processing, and avoid adding PHIs in such blocks to
the vectorizable tree.  It is unlikely that constraining vectorization
over an exception path will cause a noticeable performance loss, so
this seems preferable to trying to anticipate when a shuffle will and
will not be required.
2022-07-19 06:10:17 -07:00
Nikita Popov 08860f525a [Local] Allow creating callbr with duplicate successors
Since D129288, callbr is allowed to have duplicate successors. This
patch removes a limitation which prevents optimizations from actually
producing such callbrs.

Differential Revision: https://reviews.llvm.org/D129997
2022-07-19 14:28:22 +02:00
Alexey Lapshin 4539b44148 [Reland][Debuginfo][llvm-dwarfutil] llvm-dwarfutil dsymutil-like tool for ELF.
This patch implements proposal https://lists.llvm.org/pipermail/llvm-dev/2020-August/144579.html
llvm-dwarfutil - is a tool that is used for processing debug info(DWARF) located in built binary files to improve debug info quality, reduce debug info size. The patch currently implements smaller set of command-line options(comparing to the proposal):

```
./llvm-dwarfutil [options] <input file> <output file>

  --garbage-collection    Do garbage collection for debug info(default)
  -j <value>              Alias for --num-threads
  --no-garbage-collection Don`t do garbage collection for debug info
  --no-odr-deduplication  Don`t do ODR deduplication for debug types
  --no-odr                Alias for --no-odr-deduplication
  --no-separate-debug-file
                          Create single output file, containing debug tables(default)
  --num-threads <threads> Number of available threads for multi-threaded execution. Defaults to the number of cores on the current machine
  --odr-deduplication     Do ODR deduplication for debug types(default)
  --odr                   Alias for --odr-deduplication
  --separate-debug-file   Create two output files: file w/o debug tables and file with debug tables
  --tombstone [bfd,maxpc,exec,universal]
                          Tombstone value used as a marker of invalid address(default: universal)
    =bfd - Zero for all addresses and [1,1] for DWARF v4 (or less) address ranges and exec
    =maxpc - Minus 1 for all addresses and minus 2 for DWARF v4 (or less) address ranges
    =exec - Match with address ranges of executable sections
    =universal - Both: bfd and maxpc
```

Reviewed By: clayborg

Differential Revision: https://reviews.llvm.org/D86539
2022-07-19 15:11:36 +03:00
Benjamin Kramer 8aff88fd3a [LegalizeDAG] Propagate alignment in ExpandExtractFromVectorThroughStack
Unlike the name suggests this can reuse any store as a base for a
memory-based vector extract. If that store is underaligned the loads
created to extract will have an invalid alignment. Since most CPUs are
forgiving wrt alignment this is almost never an issue, on x86 this is
only reproducible by extracting a 128 bit vector out of a wider vector.

I tried making a test case in the context of
https://reviews.llvm.org/D127982 but it's really really fragile, as the
output pretty much looks like a missed optimization.
2022-07-19 13:13:55 +02:00
David Green 6cb9529001 [ARM] Remove VBICimm if no cleared bits are demanded
If none of the bits of a VBICimm are demanded, we can remove the node
entirely using the input operand instead.

Differential Revision: https://reviews.llvm.org/D129966
2022-07-19 11:53:47 +01:00
Florian Hahn a75760a269
[LV] Remove unnecessary cast in widenCallInstruction. (NFC) 2022-07-19 11:23:24 +01:00
Simon Pilgrim 0f6b0461b0 [DAG] SimplifyDemandedBits - relax "xor (X >> ShiftC), XorC --> (not X) >> ShiftC" to match only demanded bits
The "xor (X >> ShiftC), XorC --> (not X) >> ShiftC" fold is currently limited to the XOR mask being a shifted all-bits mask, but we can relax this to only need to match under the demanded bits.

This helps expose more bit extraction/clearing patterns and fixes the PowerPC testCompares*.ll regressions from D127115

Alive2: https://alive2.llvm.org/ce/z/fl7T7K

Differential Revision: https://reviews.llvm.org/D129933
2022-07-19 10:59:07 +01:00
Abinav Puthan Purayil 9fa425c1ab [AMDGPU] Set amdgpu-memory-bound if a basic block has dense global memory access
AMDGPUPerfHintAnalysis doesn't set the memory bound attribute if
FuncInfo::InstCost outweighs MemInstCost even if we have a basic block
with relatively high global memory access. GCNSchedStrategy could revert
optimal scheduling in favour of occupancy which seems to degrade
performance for some kernels. This change introduces the
HasDenseGlobalMemAcc metric in the heuristic that makes the analysis
more conservative in these cases.

This fixes SWDEV-334259/SWDEV-343932

Differential Revision: https://reviews.llvm.org/D129759
2022-07-19 15:16:28 +05:30
David Spickett 5d14873249 [llvm][AArch64] Add missing FPCR, H and B registers to Codeview mapping
Fixes https://github.com/llvm/llvm-project/issues/56484

H registers are 16 bit views of AArch64's Neon registers and
B are the 8 bit views.

msvc does not support 16 bit float (some mention in DirectX but I
couldn't find a way to get to it) so for lack of a better reference
I'm using:
85c9b41b33/server/references/dia/include/cvconst.h
(the other microsoft-pdb repo is no longer up to date)

Luckily clang does support fp16 so a test is added for that.

There is no 8 bit float type so I had to get creative with the
test case. We're not testing for correct debug info here just
that we can select the B register and not crash in the process.

For FPCR it's never going to be passed as an argument so I've
not added a test for it. It is included to keep our list looking
the same as the reference.

Reviewed By: majnemer

Differential Revision: https://reviews.llvm.org/D129774
2022-07-19 09:33:13 +00:00
Alexey Lapshin e717f91c96 Revert "[Debuginfo][llvm-dwarfutil] llvm-dwarfutil dsymutil-like tool for ELF."
This reverts commit e2147c26bd.
2022-07-19 12:17:47 +03:00
Max Kazantsev 82309831c3 [LoopSimplifyCFG] Prevent use-def dominance breach by handling dead exits. PR56243
One of the transforms in LoopSimplifyCFG demands that the LCSSA form is
truly maintained for all values, tokens included, otherwise it may end up creating
a use that is not dominated by def (and Phi creation for tokens is impossible).
Detect this situation and prevent transform for it early.

Differential Revision: https://reviews.llvm.org/D129984
Reviewed By: efriedma
2022-07-19 15:54:12 +07:00
Alexey Lapshin e2147c26bd [Debuginfo][llvm-dwarfutil] llvm-dwarfutil dsymutil-like tool for ELF.
This patch implements proposal https://lists.llvm.org/pipermail/llvm-dev/2020-August/144579.html
llvm-dwarfutil - is a tool that is used for processing debug info(DWARF) located in built binary files to improve debug info quality, reduce debug info size. The patch currently implements smaller set of command-line options(comparing to the proposal):

```
./llvm-dwarfutil [options] <input file> <output file>

  --garbage-collection    Do garbage collection for debug info(default)
  -j <value>              Alias for --num-threads
  --no-garbage-collection Don`t do garbage collection for debug info
  --no-odr-deduplication  Don`t do ODR deduplication for debug types
  --no-odr                Alias for --no-odr-deduplication
  --no-separate-debug-file
                          Create single output file, containing debug tables(default)
  --num-threads <threads> Number of available threads for multi-threaded execution. Defaults to the number of cores on the current machine
  --odr-deduplication     Do ODR deduplication for debug types(default)
  --odr                   Alias for --odr-deduplication
  --separate-debug-file   Create two output files: file w/o debug tables and file with debug tables
  --tombstone [bfd,maxpc,exec,universal]
                          Tombstone value used as a marker of invalid address(default: universal)
    =bfd - Zero for all addresses and [1,1] for DWARF v4 (or less) address ranges and exec
    =maxpc - Minus 1 for all addresses and minus 2 for DWARF v4 (or less) address ranges
    =exec - Match with address ranges of executable sections
    =universal - Both: bfd and maxpc
```

Reviewed By: clayborg

Differential Revision: https://reviews.llvm.org/D86539
2022-07-19 11:18:36 +03:00
Cullen Rhodes f7b2d4aac6 [AArch64] Add patterns to fold zext(cmpeq(x, splat(0)))
Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D129626
2022-07-19 08:14:38 +00:00
Nikita Popov 534b9246a2 [LoopInfo] Allow cloning of callbr
After D129288, callbr is safe to clone without special handling.
This permits optimizations like loop unroll and loop unswitch on
loops containing callbrs.

Fixes https://github.com/llvm/llvm-project/issues/41834.

Differential Revision: https://reviews.llvm.org/D129993
2022-07-19 09:57:28 +02:00
Rosie Sumpter 05d424d165 [AArch64][SVE] Fold fadda(ptrue, x, select(mask, y, -0.0)) into fadda(mask, x, y)
This patch adds an SVE pattern to recognize the use of a select with an
fadda in the form fadda(ptrue, x, select(mask, y, -0.0)). In this case
the select can be folded away, with the select mask used as the
predicate for fadda. This improves the codegen when vectorizing loops
with ordered fp reductions.

Differential Revision: https://reviews.llvm.org/D129623
2022-07-19 08:31:51 +01:00
Max Kazantsev 51f837a680 [NFC] Introduce API to detect tokens penetrating LCSSA form
Following discussion in PR56243, we need to somehow detect the situation
when token values penetrate LCSSA form for transforms that require that
it is maintained by all values (for example, to sustain use-def dominance
invarians). This patch introduces a parameter to LCSSA checkers to control
their ignorance about tokens.

Differential Revision: https://reviews.llvm.org/D129983
Reviewed By: efriedma
2022-07-19 13:52:30 +07:00
Max Kazantsev 69b284aaf6 Revert "[DAGCombiner] Teach scalarizeBinOpOfSplats handle scalable splat."
This reverts commit 58dfaaaace.

Massive AARCH test failures in buildbot.
2022-07-19 13:41:52 +07:00
Bing1 Yu e01bf5a3e2 [X86] Promote v32f16's fadd into v32f32's fadd when it is avx512 without avx512fp16
Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D130059
2022-07-19 14:37:50 +08:00
Shraiysh Vaishay 35fc666877 [OpenMP][IRBuilder] Add support for taskgroup
This patch adds support for generating taskgroup construct.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D128203
2022-07-19 10:49:34 +05:30
jacquesguan 58dfaaaace [DAGCombiner] Teach scalarizeBinOpOfSplats handle scalable splat.
This revision supports to scalarize a binary operation of two scalable splat vectors.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D122791
2022-07-19 11:20:51 +08:00
Kazushi (Jam) Marukawa 469044cfd3 [VE] Support load/store/spill of vector mask registers
Support load/store/spill of vector mask registers and add regression
tests.

Reviewed By: efocht

Differential Revision: https://reviews.llvm.org/D129415
2022-07-19 10:29:21 +09:00
zhongyunde bddf20735e [AArch64][NFC] Set true for default of subfeature is more readable
Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D129960
2022-07-19 09:00:00 +08:00
ksyx 3198364e6e [RISCV][Clang] Add support for Zmmul extension
This patch implements recently ratified extension Zmmul, a subextension
of M (Integer Multiplication and Division) consisting only
multiplication part of it.

Differential Revision: https://reviews.llvm.org/D103313
Reviewed By: craig.topper, jrtc27, asb
2022-07-18 20:26:08 -04:00
Ellis Hoag 3580daacf3 [InstrProf] Allow CSIRPGO function entry coverage
The flag `-fcs-profile-generate` for enabling CSIRPGO moves the pass
`pgo-instrumentation` after inlining. Function entry coverage works fine
with this change, so remove the assert. I had originally left this
assert in because I had not tested this at the time.

Reviewed By: davidxl, MaskRay

Differential Revision: https://reviews.llvm.org/D129407
2022-07-18 15:10:11 -07:00
Matt Arsenault 8d0383eb69 CodeGen: Remove AliasAnalysis from regalloc
This was stored in LiveIntervals, but not actually used for anything
related to LiveIntervals. It was only used in one check for if a load
instruction is rematerializable. I also don't think this was entirely
correct, since it was implicitly assuming constant loads are also
dereferenceable.

Remove this and rely only on the invariant+dereferenceable flags in
the memory operand. Set the flag based on the AA query upfront. This
should have the same net benefit, but has the possible disadvantage of
making this AA query nonlazy.

Preserve the behavior of assuming pointsToConstantMemory implying
dereferenceable for now, but maybe this should be changed.
2022-07-18 17:23:41 -04:00
Stanislav Mekhanoshin 523a99c0eb [AMDGPU] Support for gfx940 fp8 smfmac
Differential Revision: https://reviews.llvm.org/D129908
2022-07-18 12:12:41 -07:00
Stanislav Mekhanoshin 2695f0a688 [AMDGPU] Support for gfx940 fp8 mfma
Differential Revision: https://reviews.llvm.org/D129906
2022-07-18 11:49:56 -07:00
Stanislav Mekhanoshin 9fa5a6b7e8 [AMDGPU] Support for gfx940 fp8 conversions
Differential Revision: https://reviews.llvm.org/D129902
2022-07-18 11:48:43 -07:00
Florian Hahn 30e53b8c03
[LV] Sink module variable and use State to set it in widenCall. (NFC)
Limits the lifetime of the variable and makes it independent of
CallInst.
2022-07-18 19:41:48 +01:00
Jay Foad dbed4326dd [LiveIntervals] Find better anchoring end points when repairing ranges
r175673 changed repairIntervalsInRange to find anchoring end points for
ranges automatically, but the calculation of Begin included the first
instruction found that already had an index. This patch changes it to
exclude that instruction:

1. For symmetry, so that the half open range [Begin,End) only includes
   instructions that do not already have indexes.
2. As a possible performance improvement, since repairOldRegInRange
   will scan fewer instructions.
3. Because repairOldRegInRange hits assertion failures in some cases
   when it sees a def that already has a live interval.

(3) fixes about ten tests in the CodeGen lit test suite when
-early-live-intervals is forced on.

Differential Revision: https://reviews.llvm.org/D110182
2022-07-18 19:34:43 +01:00
Mubariz Afzal c444f03787 Reland "[SystemZ][z/OS] Fix f32 variadic argument assertion"
This patch relands the f32 vararg assertion on z/OS fix that was reverted previously due to the testcase failing on non-z/OS platforms. It is now passing.

The tablegen lines that specify the XPLINK64 calling convention for promoting an f32 vararg to an f64 are effectively overwritten by the following tablegen line which bitcast an f64 vararg to an i64 (so that it can be used in the GPRs). Thus it becomes a bitcast from f32 to i64. We don't handle bitcasts for f32s and so this causes an assertion to be thrown.

We fix this by simplifying the tablegen lines to explicity show this behaviour, and allow the f32 in the bitcast case by first promoting it to an f64.
2022-07-18 14:25:17 -04:00
Craig Topper 0b02752899 [RISCV] Optimize (seteq (i64 (and X, 0xffffffff)), C1)
(and X, 0xffffffff) requires 2 shifts in the base ISA. Since we
know the result is being used by a compare, we can use a sext_inreg
instead of an AND if we also modify C1 to have 33 sign bits instead
of 32 leading zeros. This can also improve the generated code for
materializing C1.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D129980
2022-07-18 10:54:45 -07:00
Arnold Schwaighofer 28ebd13d63 [coro async] Fix code to run coro.async.end cleanup like the legacy pass did
The code executed for the Switch ABI does not change.

rdar://97074714

Differential Revision: https://reviews.llvm.org/D129865
2022-07-18 10:41:29 -07:00
Igor Kudrin 32eed8828e Reapply "[NVPTX] Use the mask() operator to initialize packed structs with pointers"
The original patch revealed an issue of reading incorrect values on BE hosts.
That is now changed to use `endian::read32le()` and `endian::read64le()`.

Original commit message:

The current implementation assumes that all pointers used in the
initialization of an aggregate are aligned according to the pointer size
of the target; that might not be so if the object is packed. In that
case, an array of .u8 should be used and pointers should be decorated
with the mask() operator.

The operator was introduced in PTX ISA 7.1, so an error is issued if the
case is detected for an earlier version.

Differential Revision: https://reviews.llvm.org/D127504
2022-07-18 20:56:26 +04:00
Craig Topper 7c0b9b379b [RISCV] Add isel patterns for ineg+setge/le/uge/ule.
setge/le/uge/ule selected by themselves require an xori with 1.
If we're negating the setcc, we can fold the xori with the neg
to create an addi with -1.

This works because xori X, 1 is equivalent to 1 - X if X is either
0 or 1. So we're doing -(1 - X) which is X-1 or X+-1.

This improves the code for selecting between 0 and -1 based on a
condition for some conditions.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D129957
2022-07-18 09:55:01 -07:00
Fangrui Song b3fd3a9ac3 [IR] Allow absence for Min module flags and make AArch64 BTI/PAC-RET flags backward compatible
D123493 introduced llvm::Module::Min to encode module flags metadata for AArch64
BTI/PAC-RET. llvm::Module::Min does not take effect when the flag is absent in
one module. This behavior is misleading and does not address backward
compatibility problems (when a bitcode with "branch-target-enforcement"==1 and
another without the flag are merged, the merge result is 1 instead of 0).

To address the problems, require Min flags to be non-negative and treat absence
as having a value of zero. For an old bitcode without
"branch-target-enforcement"/"sign-return-address", its value is as if 0.

Differential Revision: https://reviews.llvm.org/D129911
2022-07-18 09:35:12 -07:00
Itay Bookstein 2570f226d1 [SDAG] Remove single-result restriction on commutative CSE
The DAG Combiner unnecessarily restricts commutative CSE
to nodes with a single result value. This commit removes
that restriction.

Signed-off-by: Itay Bookstein <ibookstein@gmail.com>

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D129666
2022-07-18 19:19:13 +03:00
Craig Topper d7f2a63371 [RISCV] Fold stack reload into sext.w by using lw instead of ld.
We can use lw to load 4 bytes from the stack and sign extend them
instead of loading all 8 bytes.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D129948
2022-07-18 09:09:17 -07:00
Igor Kudrin 1e451369d2 Revert "[NVPTX] Use the mask() operator to initialize packed structs with pointers"
The new test fails on BE hosts.

This reverts commit 04e978ccba.
2022-07-18 20:08:39 +04:00
Nicolai Hähnle 3443788087 Revert "Inliner: don't mark call sites as 'nounwind' if that would be redundant"
This reverts commit 9905c37981.

Looks like there are Clang changes that are affected in trivial ways. Will look into it.
2022-07-18 17:43:35 +02:00
Nicolai Hähnle 9905c37981 Inliner: don't mark call sites as 'nounwind' if that would be redundant
When F calls G calls H, G is nounwind, and G is inlined into F, then the
inlined call-site to H should be effectively nounwind so as not to lose
information during inlining.

If H itself is nounwind (which often happens when H is an intrinsic), we
no longer mark the callsite explicitly as nounwind. Previously, there
were cases where the inlined call-site of H differs from a pre-existing
call-site of H in F *only* in the explicitly added nounwind attribute,
thus preventing common subexpression elimination.

v2:
- just check CI->doesNotThrow

Differential Revision: https://reviews.llvm.org/D129860
2022-07-18 17:28:52 +02:00
Sanjay Patel 26fbb79c33 [InstCombine] reduce code for signbit folds; NFC 2022-07-18 11:04:58 -04:00
Lorenzo Albano c00a44fa68 [VP] IR expansion pass for VP gather and scatter
Add vp_gather and vp_scatter expansion to unpredicated intrinsics.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D120664
2022-07-18 17:00:38 +02:00
zhijian a6316d6da5 [AIX] support read global symbol of big archive
Reviewers: James Henderson, Fangrui Song

Differential Revision: https://reviews.llvm.org/D124865
2022-07-18 10:43:30 -04:00
Nikita Popov 21e2f133a8 [LoopSimplifyCFG] Revert accidental change
This change was included in an unrelated change
b57d61384c
and was of course not intended for commit...
2022-07-18 15:30:13 +02:00
Nikita Popov b57d61384c [ConstantRangeTest] Move nowrap binop tests to generic infrastructure (NFC)
Move testing for add/sub with nowrap flags to TestBinaryOpExhaustive,
rather than separate homegrown exhaustive testing functions.
2022-07-18 15:14:17 +02:00
Petar Avramovic c287bc4841 [AMDGPU][MC][GFX11] AsmParser for op_sel for VOP3 dpp opcodes
Parse op_sel for *_e64_dpp VOP3 opcodes.
Depends on D129637 and setting of VOP3_OPSEL in dpp pseudos.

Differential Revision: https://reviews.llvm.org/D129767
2022-07-18 15:08:52 +02:00
Simon Pilgrim 4f10516325 [AArch64] isDesirableToCommuteWithShift - add explicit ShiftLHS variable instead of altering a incoming argument variable
As discussed on D129995, altering the 'N' variable to point to shift's source value was confusing.
2022-07-18 13:28:07 +01:00
Simon Pilgrim 259c36e7c1 [DAG] Add asserts to isDesirableToCommuteWithShift overrides to ensure its being called from a shift. NFC. 2022-07-18 13:11:24 +01:00
Simon Pilgrim 07ab0cb4e7 [DAG] Add missing asserts to shouldFoldConstantShiftPairToMask overrides to ensure a shl/srl pair is used. NFC. 2022-07-18 13:11:23 +01:00
Nikita Popov 56b4b6e81b [SDAG] Fix release build
This variable was only declared in debug builds, but is needed
in release builds as well.
2022-07-18 14:10:31 +02:00
Benjamin Kramer 4bd072c56b [LAA] Fix the build with older versions of Clang
llvm/lib/Analysis/LoopAccessAnalysis.cpp:916:12: error: no viable conversion from returned value of type 'SmallVector<[...], 2>' to function return type 'SmallVector<[...], (default)
      CalculateSmallVectorDefaultInlinedElements<T>::value aka 3>'
    return Scevs;
           ^~~~~
2022-07-18 14:01:47 +02:00
Benjamin Kramer 9234a7c0df [X86][FP16] Don't crash when lowering SELECT on fp16 vectors
This is a regression from f187948162
2022-07-18 13:41:00 +02:00