HVC::calculatePointerDifference inserts temporary instructions for
simplification, and calulation of known bits. These instructions were
inserted at the end of a basic block (after the terminator), which
caused BB->getTerminator() to return nullptr. This, in turn, caused
a crash when a PHI instruction was examined in computeKnownBits.
Extracted from D135714 which adds summary support for MemProf. We will
need a 3rd tuple member in the ValueIdToValueInfoMap, this patch makes a
number of NFC changes to the existing clients of that map to reflect the
conversion of pair to tuple.
This reverts commit 8ef3fd8d59.
I mentioned that GlobalAlias was not handled. It turns out GlobalAlias has to be handled in the same patch (as opposed to in a follow-up),
as otherwise clang codegen of C5/D5 constructor/destructor would regress (https://reviews.llvm.org/D135427#3869003).
This patch adds support for the `depend` clause for the `task`
construct.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D135695
The Key for the SubtargetMap had the StreamingSVEModeDisabled in the
wrong place. This change is non-functional, since the string (key) is
still unique.
This patch adds the assembly/disassembly for the following instructions:
For INT:
ADD(array results, multiple and single vector): Add replicated single
vector to multi-vector with ZA array vector results.
SUB(array results, multiple and single vector): Subtract replicated single
vector from multi-vector with ZA array vector results.
For FP:
FMLA (multiple and single vector): Multi-vector floating-point fused
multiply-add by vector.
FMLS (multiple and single vector): Multi-vector floating-point
multiply-subtract long by vector.
The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2022-09
The Matriz Operand has 2 new sizes 32(.s) and 64(.d) bits
(MatrixOp32 and MatrixOp64)
Depends on: D135448
Depends on: D135952
Differential Revision: https://reviews.llvm.org/D135455
When the SME attributes tell that a function is or may be executed in Streaming
SVE mode, we currently need to be conservative and disable _any_ vectorization
(fixed or scalable) because the code-generator does not yet support generating
streaming-compatible code.
Scalable auto-vec will be gradually enabled in the future when we have
confidence that the loop-vectorizer won't use any SVE or NEON instructions
that are illegal in Streaming SVE mode.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D135950
This is an alternative of D120395 and D120411.
Previously we use `__bfloat16` as a typedef of `unsigned short`. The
name may give user an impression it is a brand new type to represent
BF16. So that they may use it in arithmetic operations and we don't have
a good way to block it.
To solve the problem, we introduced `__bf16` to X86 psABI and landed the
support in Clang by D130964. Now we can solve the problem by switching
intrinsics to the new type.
Reviewed By: LuoYuanke, RKSimon
Differential Revision: https://reviews.llvm.org/D132329
This just commons up and simplifies some logic that was repeated in
SIInsertWaitcnts::updateEventWaitcntAfter. NFCI.
Differential Revision: https://reviews.llvm.org/D136253
The amdgcn.ldexp.* intrinsics take an i32 value as src1.
The V_LDEXP_F16 instruction considers src1 an f16 operand, and therefore
src1 is implicitly truncated to 16 bits when lowering to that instruction from the
intrinsic. This is unlikely to result in an error in practice
because values that large are not useful.
The operand class of src1 in the True16 version of the instruction has
been corrected to encode correctly on GFX11.
Reviewed By: foad, rampitec
Differential Revision: https://reviews.llvm.org/D136195
As part of the optimization in the unreachable code, we remove
tokens, thereby replacing them with undef/poison in intrinsics.
But the verifier falls on the assertion, within of what it sees
token poison in unreachable code, which in turn is incorrect.
bug: 57871, https://github.com/llvm/llvm-project/issues/57871
Differential Revision: https://reviews.llvm.org/D134427
@Ayal suggested a better named helper than using `!getDef()` to check if
a value is invariant across all parts.
The property we are using here is that the VPValue is defined outside
any vector loop region. There's a TODO left to handle recipes defined in
pre-header blocks.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D133666
@foad was right - this isn't actually going to help with D136042 as much as hoped, we need a better AMDGPU-specific solution as other targets are likely to make use of it
This patch removes the bail out for signed predicates and non-positive
strides in howManyLessThans and updates computeMaxBECountForLT to return
SCEVCouldNotCompute for signed predicates with negative strides.
AFAICT bail-out was only added because computeMaxBECountForLT may not
handle negative signed strides correctly. Instead of not calling
computeMaxBECountForLT at all because we bail out earlier, we can
instead return SCEVCouldNotCompute in computeMaxBECountForLT.
The max backedge taken count will be computed as the max value of the
symbolic backedge taken count.
This improves precision in cases where we can compute symbolic backedge
taken counts and also fixes a crash.
Fixes#57818.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D135667
This patch extends the load merge/widen in AggressiveInstCombine() to handle reverse load patterns.
Differential Revision: https://reviews.llvm.org/D135137
Class ExpansionContext encapsulates options for search and expansion of
response files, including configuration files. With this change the
directories which are searched for configuration files are also stored
in ExpansionContext.
Differential Revision: https://reviews.llvm.org/D135439
Helps with some of the AMDGPU regressions identified in D136042 where we were losing signed BFE patterns after sinking shifts behind logic ops.
Differential Revision: https://reviews.llvm.org/D136081
getDefIgnoringCopies and getSrcRegIgnoringCopies should not fail on
valid MIR, so don't bother to check for failure.
Differential Revision: https://reviews.llvm.org/D136238
The names in developer.arm for these SME features are:
HaveSMEI16I64 and HaveSMEF64F64
so the new flag names are consistent with the documentation page
Reviewed By: sdesmalen, c-rhodes
Differential Revision: https://reviews.llvm.org/D135974
Followup to D135962 to rename remaining uses of
FunctionModRefBehavior to MemoryEffects. Does not touch API names
yet, but also updates variables names FMRB/MRB to ME, to match the
new type name.
For RISC-V, load/store(exclude vector load/store) instructions only
has a 12 bit immediate operand. If the offset is out-of-range, it
must make use of a temp register to make up this offset. If between
these offsets, they have a small(IsInt<12>) relative offset,
LocalStackSlotAllocation pass can find a value as frame base register's
value, and replace the origin offset with this register's value plus
the relative offset.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D98101
If Mask[0] is 0, then we're never going to match a slidedown. If
we get through the for loop, then it's an identity mask which should
have already been optimized out. Otherwise it's some non-contiguous
mask that will fail out of the lop. Might as well not bother entering
the loop.
For now, we have not parse section flag `Info` in asm file. When we emit a section with info flag to asm, then compile asm to obj we will lose the Info flag for the section.
The motivation of this change is ARM64EC's hybmp$x section. If we lose the Info flag MSVC link will report a warning:
`warning LNK4078: multiple '.hybmp' sections found with different attributes`
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D136125
Fix invalid RISCV-like MI being emitted for performing the `not`
operation: the LoongArch `xori` zero-extends the immediate, hence is
not equivalent to RISCV `xori`. The LoongArch `not` is a `nor` with
zero.
Patch by lrzlin (Lin Runze).
Differential Revision: https://reviews.llvm.org/D136021
After ISEL, the "valid" loop header which has two predecessors
(one is preheader and the other one is latch) may be transformed
to have more than two predecessors by some optimizations, like tail
duplicator, if the old header's successor(will be changed to new
header) is a sub loop.
The predecessors of the new loop header are preheader, loop latch
and the loop latch(es) of the sub loop(old header's successor).
Before the patch, ctrloop pass assumes two predecessors for candidate
loop header. This patch fixes this case.
Reviewed By: lkail
Differential Revision: https://reviews.llvm.org/D135846
Generalized the cost model estimation. Improved cost model estimation
for repeated scalars (no need to count their cost anymore), improved
cost model for extractelement instructions.
cpu2017
511.povray_r 0.57
520.omnetpp_r -0.98
521.wrf_r -0.01
525.x264_r 3.59 <+
526.blender_r -0.12
531.deepsjeng_r -0.07
538.imagick_r -1.42
Geometric mean: 0.21
Differential Revision: https://reviews.llvm.org/D115757
Representing this as 12 separate operations is a bit ugly, but
trying to represent the different modes using a bitfield seemed worse.
Differential Revision: https://reviews.llvm.org/D135417
DXContainer files have a handful of sections that need to be written.
This adds a pass to write the section data into IR globals, and writes
the shader flag data into a global.
The test cases here verify that the shader flags are correctly written
from the IR into the global and emitted to the DXContainer.
This change also fixes a bug in the MCDXContainerWriter, where the size
of the dxbc::ProgramHeader was not being included in the part offset
calcuations. This is verified to be working by the new testcases where
obj2yaml can properly dump part data for parts after the DXIL part.
Resolves issue #57742 (https://github.com/llvm/llvm-project/issues/57742)
Reviewed By: python3kgae
Differential Revision: https://reviews.llvm.org/D135793
Before this patch (and refactor patch D135843), isBitfieldPositioningOp won't handle "and(any_extend(shl(val, N), shifted-mask)" (bail out if AND op is not SHL)
After this patch, isBitfieldPositioningOp will see through "any_extend" to find "shl" to find possible bit-field-positioning nodes.
https://gcc.godbolt.org/z/3ncGKbGW6 is a four-liner LLVM IR that could be optimized to UBFIZ (see added test case test_and_extended_shift_with_imm in llvm/test/CodeGen/AArch64/bitfield-insert.ll). One existing test case also improves.
Differential Revision: https://reviews.llvm.org/D135852
Generalized the cost model estimation. Improved cost model estimation
for repeated scalars (no need to count their cost anymore), improved
cost model for extractelement instructions.
cpu2017
511.povray_r 0.57
520.omnetpp_r -0.98
521.wrf_r -0.01
525.x264_r 3.59 <+
526.blender_r -0.12
531.deepsjeng_r -0.07
538.imagick_r -1.42
Geometric mean: 0.21
Differential Revision: https://reviews.llvm.org/D115757
Fixes an SROA crash.
Fallout from opaque pointers since with typed pointers we'd bail out at the bitcast.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D136119