Commit Graph

162738 Commits

Author SHA1 Message Date
Krzysztof Parzyszek e5d9ab08c3 [Hexagon] Fix insertion point for pointer difference calculation
HVC::calculatePointerDifference inserts temporary instructions for
simplification, and calulation of known bits. These instructions were
inserted at the end of a basic block (after the terminator), which
caused BB->getTerminator() to return nullptr. This, in turn, caused
a crash when a PHI instruction was examined in computeKnownBits.
2022-10-19 14:23:39 -07:00
Teresa Johnson 646e25d051 [BitcodeReader] Convert pair to triple in preparation for MemProf (NFC)
Extracted from D135714 which adds summary support for MemProf. We will
need a 3rd tuple member in the ValueIdToValueInfoMap, this patch makes a
number of NFC changes to the existing clients of that map to reflect the
conversion of pair to tuple.
2022-10-19 13:34:30 -07:00
Michal Paszkowski 6beac40fe4 [SPIR-V] Add get_image_num_mip_levels implementation
Differential Revision: https://reviews.llvm.org/D135904
2022-10-19 22:29:16 +02:00
Michal Paszkowski 5fb4a05148 [SPIR-V] Add atomic_init and fix atomic explicit lowering
Differential Revision: https://reviews.llvm.org/D135902
2022-10-19 22:13:29 +02:00
Alexey Bataev b8b740c834 [SLP][NFC]Remove unused variable, NFC. 2022-10-19 12:35:27 -07:00
Fangrui Song c80b12d352 Revert D135427 "[LTO] Make local linkage GlobalValue in non-prevailing COMDAT available_externally"
This reverts commit 8ef3fd8d59.

I mentioned that GlobalAlias was not handled. It turns out GlobalAlias has to be handled in the same patch (as opposed to in a follow-up),
as otherwise clang codegen of C5/D5 constructor/destructor would regress (https://reviews.llvm.org/D135427#3869003).
2022-10-19 11:24:12 -07:00
Yuanfang Chen 24c6ea917c [JMCInstrument] rename ELF section name from ".just.my.code" to ".data.just.my.code"
This gives linker scripts a hint about where to place the section.
2022-10-19 10:49:54 -07:00
Prabhdeep Singh Soni 6149589127 [OMPIRBuilder] Support depend clause for task
This patch adds support for the `depend` clause for the `task`
construct.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D135695
2022-10-19 13:11:43 -04:00
Chris Bieneman 607be386e7 [DX] Fix missing preserved analysis
The ShaderFlagsAnalysisWrapper needs to be marked to preserve all
analyssis.

Fixes #58474 (https://github.com/llvm/llvm-project/issues/58474)
2022-10-19 12:11:03 -05:00
Sander de Smalen 36864d47d6 [AArch64] Fix minor issue introduced in D135950.
The Key for the SubtargetMap had the StreamingSVEModeDisabled in the
wrong place. This change is non-functional, since the string (key) is
still unique.
2022-10-19 17:01:41 +00:00
Caroline Concatto 2ecbe8c38c [AArch64] SME2 Single-multi vector ternary int/FP 2 and 4 registers
This patch adds the assembly/disassembly for the following instructions:

For INT:
    ADD(array results, multiple and single vector): Add replicated single
        vector to multi-vector with ZA array vector results.
    SUB(array results, multiple and single vector): Subtract replicated single
        vector from multi-vector with ZA array vector results.
For FP:
    FMLA (multiple and single vector): Multi-vector floating-point fused
          multiply-add by vector.
    FMLS (multiple and single vector): Multi-vector floating-point
          multiply-subtract long by vector.
The reference can be found here:

https://developer.arm.com/documentation/ddi0602/2022-09

The Matriz Operand has 2 new sizes 32(.s) and 64(.d) bits
(MatrixOp32 and MatrixOp64)

Depends on: D135448

Depends on:  D135952

Differential Revision: https://reviews.llvm.org/D135455
2022-10-19 17:49:48 +01:00
Sander de Smalen 137459aff6 [AArch64][SME] Disable (SLP|Loop)Vectorizer when function may be executed in streaming mode.
When the SME attributes tell that a function is or may be executed in Streaming
SVE mode, we currently need to be conservative and disable _any_ vectorization
(fixed or scalable) because the code-generator does not yet support generating
streaming-compatible code.

Scalable auto-vec will be gradually enabled in the future when we have
confidence that the loop-vectorizer won't use any SVE or NEON instructions
that are illegal in Streaming SVE mode.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D135950
2022-10-19 16:42:20 +00:00
Phoebe Wang bc1819389f [X86][RFC] Using `__bf16` for AVX512_BF16 intrinsics
This is an alternative of D120395 and D120411.

Previously we use `__bfloat16` as a typedef of `unsigned short`. The
name may give user an impression it is a brand new type to represent
BF16. So that they may use it in arithmetic operations and we don't have
a good way to block it.

To solve the problem, we introduced `__bf16` to X86 psABI and landed the
support in Clang by D130964. Now we can solve the problem by switching
intrinsics to the new type.

Reviewed By: LuoYuanke, RKSimon

Differential Revision: https://reviews.llvm.org/D132329
2022-10-19 23:47:04 +08:00
Jay Foad f0ca946bf9 [AMDGPU] New helper function SIInsertWaitcnts::getVmemWaitEventType
This just commons up and simplifies some logic that was repeated in
SIInsertWaitcnts::updateEventWaitcntAfter. NFCI.

Differential Revision: https://reviews.llvm.org/D136253
2022-10-19 16:22:50 +01:00
Joe Nash ad6698562c [AMDGPU] V_LDEXP_F16 encoding fix and doc update.
The amdgcn.ldexp.* intrinsics take an i32 value as src1.
The V_LDEXP_F16 instruction considers src1 an f16 operand, and therefore
src1 is implicitly truncated to 16 bits when lowering to that instruction from the
intrinsic. This is unlikely to result in an error in practice
because values that large are not useful.

The operand class of src1 in the True16 version of the instruction has
been corrected to encode correctly on GFX11.

Reviewed By: foad, rampitec

Differential Revision: https://reviews.llvm.org/D136195
2022-10-19 09:52:53 -04:00
dbakunevich fecfd01252 [Verifier] Allow undef/poison token argument to llvm.experimental.gc.result
As part of the optimization in the unreachable code, we remove
tokens, thereby replacing them with undef/poison in intrinsics.
But the verifier falls on the assertion, within of what it sees
token poison in unreachable code, which in turn is incorrect.

bug: 57871, https://github.com/llvm/llvm-project/issues/57871
Differential Revision: https://reviews.llvm.org/D134427
2022-10-19 20:51:21 +07:00
Florian Hahn d72fcee8f4
[VPlan] Add VPValue::isDefinedOutsideVectorRegions helper (NFC).
@Ayal suggested a better named helper than using `!getDef()` to check if
a value is invariant across all parts.

The property we are using here is that the VPValue is defined outside
any vector loop region. There's a TODO left to handle recipes defined in
pre-header blocks.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D133666
2022-10-19 13:20:30 +01:00
Simon Pilgrim 9708d88017 Revert rG42230efccf8fe1185be5fa6c23dce0a8183d6ec9 "[DAG] Fold (sra (or (shl x, c1), (shl y, c2)), c1) -> (sext_inreg (or x, (shl y,c2-c1)) iff c2 >= c1"
@foad was right - this isn't actually going to help with D136042 as much as hoped, we need a better AMDGPU-specific solution as other targets are likely to make use of it
2022-10-19 12:07:41 +01:00
Florian Hahn 1625224fbb
[SCEV] Replace assert with returning CouldNotComp in computeMaxBECountForLT.
This patch removes the bail out for signed predicates and non-positive
strides in howManyLessThans and updates computeMaxBECountForLT to return
SCEVCouldNotCompute for signed predicates with negative strides.

AFAICT bail-out was only added because computeMaxBECountForLT may not
handle negative signed strides correctly. Instead of not calling
computeMaxBECountForLT at all because we bail out earlier, we can
instead return SCEVCouldNotCompute in computeMaxBECountForLT.

The max backedge taken count will be computed as the max value of the
symbolic backedge taken count.

This improves precision in cases where we can compute symbolic backedge
taken counts and also fixes a crash.

Fixes #57818.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D135667
2022-10-19 11:24:10 +01:00
bipmis 38f3e44997 [AggressiveInstCombine] Load merge the reverse load pattern of consecutive loads.
This patch extends the load merge/widen in AggressiveInstCombine() to handle reverse load patterns.

Differential Revision: https://reviews.llvm.org/D135137
2022-10-19 11:22:58 +01:00
Serge Pavlov 0dec5e164f Keep configuration file search directories in ExpansionContext. NFC
Class ExpansionContext encapsulates options for search and expansion of
response files, including configuration files. With this change the
directories which are searched for configuration files are also stored
in ExpansionContext.

Differential Revision: https://reviews.llvm.org/D135439
2022-10-19 17:20:14 +07:00
Simon Pilgrim 42230efccf [DAG] Fold (sra (or (shl x, c1), (shl y, c2)), c1) -> (sext_inreg (or x, (shl y,c2-c1)) iff c2 >= c1
Helps with some of the AMDGPU regressions identified in D136042 where we were losing signed BFE patterns after sinking shifts behind logic ops.

Differential Revision: https://reviews.llvm.org/D136081
2022-10-19 11:18:49 +01:00
Jay Foad ea09a426a9 [AMDGPU] Assume getDefIgnoringCopies will succeed. NFC.
getDefIgnoringCopies and getSrcRegIgnoringCopies should not fail on
valid MIR, so don't bother to check for failure.

Differential Revision: https://reviews.llvm.org/D136238
2022-10-19 11:10:00 +01:00
Caroline Concatto 579ca5e7e1 [AArch64] Replace sme-i64 by sme-i16i64 and sme-f64 by sme-f64f64
The names in developer.arm for these SME features are:
  HaveSMEI16I64 and HaveSMEF64F64
so the new flag names are consistent with the documentation page

Reviewed By: sdesmalen, c-rhodes

Differential Revision: https://reviews.llvm.org/D135974
2022-10-19 10:56:46 +01:00
Juan Manuel MARTINEZ CAAMAÑO bb24b2c610 [AMDGPU][Backend] Fix user-after-free in AMDGPUReleaseVGPRs::isLastVGPRUseVMEMStore
Reviewed By: jpages, arsenm

Differential Revision: https://reviews.llvm.org/D134641
2022-10-19 04:38:16 -05:00
Nikita Popov 747f27d97d [AA] Rename getModRefBehavior() to getMemoryEffects() (NFC)
Follow up on D135962, renaming the method name to match the new
type name.
2022-10-19 11:03:54 +02:00
Nikita Popov 1a9d9823c5 [AA] Rename uses of FunctionModRefBehavior (NFC)
Followup to D135962 to rename remaining uses of
FunctionModRefBehavior to MemoryEffects. Does not touch API names
yet, but also updates variables names FMRB/MRB to ME, to match the
new type name.
2022-10-19 10:54:47 +02:00
luxufan 82c820b95c [RISCV] Enable the LocalStackSlotAllocation pass support
For RISC-V, load/store(exclude vector load/store) instructions only
has a 12 bit immediate operand. If the offset is out-of-range, it
must make use of a temp register to make up this offset. If between
these offsets, they have a small(IsInt<12>) relative offset,
LocalStackSlotAllocation pass can find a value as frame base register's
value, and replace the origin offset with this register's value plus
the relative offset.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98101
2022-10-19 16:15:14 +08:00
Freddy Ye 3ee58e2f35 [X86] Add WRMSRNS instructions.
For more details about these instructions, please refer to the latest ISE document: https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D135935
2022-10-19 13:04:11 +08:00
Craig Topper 7a4e56acac [RISCV] Add an early out to lowerVECTOR_SHUFFLEAsVSlidedown. NFC
If Mask[0] is 0, then we're never going to match a slidedown. If
we get through the for loop, then it's an identity mask which should
have already been optimized out. Otherwise it's some non-contiguous
mask that will fail out of the lop. Might as well not bother entering
the loop.
2022-10-18 21:35:15 -07:00
Freddy Ye e3df4ba9d2 [X86] Add MSRLIST instructions.
For more details about these instructions, please refer to the latest ISE document: https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html

Reviewed By: skan, RKSimon

Differential Revision: https://reviews.llvm.org/D135934
2022-10-19 10:35:42 +08:00
chenglin.bi b18293edc3 [MC][COFF] Add COFF section flag "Info"
For now, we have not parse section flag `Info` in asm file. When we emit a section with info flag to asm, then compile asm to obj we will lose the Info flag for the section.
The motivation of this change is ARM64EC's hybmp$x section. If we lose the Info flag MSVC link will report a warning:
`warning LNK4078: multiple '.hybmp' sections found with different attributes`

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D136125
2022-10-19 10:32:58 +08:00
Weining Lu 771aee91c8 Reland "[LoongArch] Fix codegen of atomicrmw nand"
Fix invalid RISCV-like MI being emitted for performing the `not`
operation: the LoongArch `xori` zero-extends the immediate, hence is
not equivalent to RISCV `xori`. The LoongArch `not` is a `nor` with
zero.

Patch by lrzlin (Lin Runze).

Differential Revision: https://reviews.llvm.org/D136021
2022-10-19 10:05:35 +08:00
Chen Zheng df9d60af1f [PowerPC] handle more than two predecessors loop header in ctrloop pass
After ISEL, the "valid" loop header which has two predecessors
(one is preheader and the other one is latch) may be transformed
to have more than two predecessors by some optimizations, like tail
duplicator, if the old header's successor(will be changed to new
header) is a sub loop.

The predecessors of the new loop header are preheader, loop latch
and the loop latch(es) of the sub loop(old header's successor).

Before the patch, ctrloop pass assumes two predecessors for candidate
loop header. This patch fixes this case.

Reviewed By: lkail

Differential Revision: https://reviews.llvm.org/D135846
2022-10-19 01:11:58 +00:00
Weining Lu 0f374ca5cd Revert "[LoongArch] Fix codegen of atomicrmw nand"
This reverts commit 9572406bbc.

The author name is wrong.
2022-10-19 07:56:23 +08:00
Alexey Bataev 087dadfd37 [SLP]Generalize cost model.
Generalized the cost model estimation. Improved cost model estimation
for repeated scalars (no need to count their cost anymore), improved
  cost model for extractelement instructions.

cpu2017
   511.povray_r             0.57
   520.omnetpp_r           -0.98
   521.wrf_r               -0.01
   525.x264_r               3.59 <+
   526.blender_r           -0.12
   531.deepsjeng_r         -0.07
   538.imagick_r           -1.42
Geometric mean:  0.21

Differential Revision: https://reviews.llvm.org/D115757
2022-10-18 11:55:59 -07:00
Eli Friedman d6481dc88c [AArch64][Windows] Add MC support for save_any_reg.
Representing this as 12 separate operations is a bit ugly, but
trying to represent the different modes using a bitfield seemed worse.

Differential Revision: https://reviews.llvm.org/D135417
2022-10-18 11:45:27 -07:00
Alexey Bataev 62267e8de0 Revert "[SLP]Generalize cost model."
This reverts commit f12fb91188 and
f5c747bfbe to fix detected non-initialized
var use.
2022-10-18 11:25:59 -07:00
Sjoerd Meijer f7c42a278b Revert "Recommit "[LoopFlatten] Enable it by default""
This reverts commit 5b9597f59a.

A miscompilation was reported:
https://github.com/llvm/llvm-project/issues/58441

Reverting this while I look at that.
2022-10-18 23:36:36 +05:30
Alexey Bataev f5c747bfbe [SLP][NFC]Fix a warning for ?: with enum/unsigned, NFC. 2022-10-18 10:08:05 -07:00
Krzysztof Parzyszek 6a8cfe9a72 [Hexagon] Use shifts by scalar for funnel shifts by scalar
HVX has vector shifts by a scalar register. Use those in the expansions
of funnel shifts where profitable.
2022-10-18 09:49:17 -07:00
Chris Bieneman 6e05c8dfc8 [DX] Create globals for DXContainer parts
DXContainer files have a handful of sections that need to be written.
This adds a pass to write the section data into IR globals, and writes
the shader flag data into a global.

The test cases here verify that the shader flags are correctly written
from the IR into the global and emitted to the DXContainer.

This change also fixes a bug in the MCDXContainerWriter, where the size
of the dxbc::ProgramHeader was not being included in the part offset
calcuations. This is verified to be working by the new testcases where
obj2yaml can properly dump part data for parts after the DXIL part.

Resolves issue #57742 (https://github.com/llvm/llvm-project/issues/57742)

Reviewed By: python3kgae

Differential Revision: https://reviews.llvm.org/D135793
2022-10-18 11:48:08 -05:00
Florian Hahn c65513444b
[IndVars] Forget SCEV for instruction and users before replacing it.
Extra invalidation is needed here to clear stale values to fix a
verification failure.

Fixes #58440.
2022-10-18 17:38:14 +01:00
Mingming Liu 34d18fd241 [AArch64] Enhance bit-field-positioning op matcher to see through 'any_extend' for pattern 'and(any_extend(shl(val, N)), shifted-mask)'
Before this patch (and refactor patch D135843), isBitfieldPositioningOp won't handle "and(any_extend(shl(val, N), shifted-mask)" (bail out if AND op is not SHL)

After this patch, isBitfieldPositioningOp will see through "any_extend" to find "shl" to find possible bit-field-positioning nodes.

https://gcc.godbolt.org/z/3ncGKbGW6 is a four-liner LLVM IR that could be optimized to UBFIZ (see added test case test_and_extended_shift_with_imm in llvm/test/CodeGen/AArch64/bitfield-insert.ll). One existing test case also improves.

Differential Revision: https://reviews.llvm.org/D135852
2022-10-18 09:07:14 -07:00
Han-Kuan Chen 615af94dc2 [RISCV] Lower VECTOR_SHUFFLE to VSLIDEDOWN_VL.
Differential Revision: https://reviews.llvm.org/D136136
2022-10-18 08:58:39 -07:00
Anton Sidorenko 1978b4d968 [MachineCombiner][RISCV] Enable MachineCombiner for RISCV
Initial implementation to match basic FP reassociation patterns.

Differential Revision: https://reviews.llvm.org/D135264
2022-10-18 18:56:32 +03:00
Alexey Bataev f12fb91188 [SLP]Generalize cost model.
Generalized the cost model estimation. Improved cost model estimation
for repeated scalars (no need to count their cost anymore), improved
  cost model for extractelement instructions.

cpu2017
   511.povray_r             0.57
   520.omnetpp_r           -0.98
   521.wrf_r               -0.01
   525.x264_r               3.59 <+
   526.blender_r           -0.12
   531.deepsjeng_r         -0.07
   538.imagick_r           -1.42
Geometric mean:  0.21

Differential Revision: https://reviews.llvm.org/D115757
2022-10-18 08:49:32 -07:00
Arthur Eubanks 743087fb63 Port print-cfg-sccs to new pass manager
This is actually used, see https://discourse.llvm.org/t/use-print-callgrapg-sccs-from-opt/65782.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D135718
2022-10-18 08:47:08 -07:00
Arthur Eubanks 6219ec07c6 [SROA] Don't speculate phis with different load user types
Fixes an SROA crash.

Fallout from opaque pointers since with typed pointers we'd bail out at the bitcast.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D136119
2022-10-18 08:44:13 -07:00
Sanjay Patel 44b7da89d7 [InstCombine] fmul nnan X, 0.0 --> copysign(0.0, X)
https://alive2.llvm.org/ce/z/ybgM5F

Differential Revision: https://reviews.llvm.org/D136166
2022-10-18 11:34:02 -04:00