Commit Graph

69244 Commits

Author SHA1 Message Date
Krzysztof Parzyszek e5d9ab08c3 [Hexagon] Fix insertion point for pointer difference calculation
HVC::calculatePointerDifference inserts temporary instructions for
simplification, and calulation of known bits. These instructions were
inserted at the end of a basic block (after the terminator), which
caused BB->getTerminator() to return nullptr. This, in turn, caused
a crash when a PHI instruction was examined in computeKnownBits.
2022-10-19 14:23:39 -07:00
Michal Paszkowski 6beac40fe4 [SPIR-V] Add get_image_num_mip_levels implementation
Differential Revision: https://reviews.llvm.org/D135904
2022-10-19 22:29:16 +02:00
Michal Paszkowski 5fb4a05148 [SPIR-V] Add atomic_init and fix atomic explicit lowering
Differential Revision: https://reviews.llvm.org/D135902
2022-10-19 22:13:29 +02:00
Chris Bieneman 607be386e7 [DX] Fix missing preserved analysis
The ShaderFlagsAnalysisWrapper needs to be marked to preserve all
analyssis.

Fixes #58474 (https://github.com/llvm/llvm-project/issues/58474)
2022-10-19 12:11:03 -05:00
Sander de Smalen 36864d47d6 [AArch64] Fix minor issue introduced in D135950.
The Key for the SubtargetMap had the StreamingSVEModeDisabled in the
wrong place. This change is non-functional, since the string (key) is
still unique.
2022-10-19 17:01:41 +00:00
Caroline Concatto 2ecbe8c38c [AArch64] SME2 Single-multi vector ternary int/FP 2 and 4 registers
This patch adds the assembly/disassembly for the following instructions:

For INT:
    ADD(array results, multiple and single vector): Add replicated single
        vector to multi-vector with ZA array vector results.
    SUB(array results, multiple and single vector): Subtract replicated single
        vector from multi-vector with ZA array vector results.
For FP:
    FMLA (multiple and single vector): Multi-vector floating-point fused
          multiply-add by vector.
    FMLS (multiple and single vector): Multi-vector floating-point
          multiply-subtract long by vector.
The reference can be found here:

https://developer.arm.com/documentation/ddi0602/2022-09

The Matriz Operand has 2 new sizes 32(.s) and 64(.d) bits
(MatrixOp32 and MatrixOp64)

Depends on: D135448

Depends on:  D135952

Differential Revision: https://reviews.llvm.org/D135455
2022-10-19 17:49:48 +01:00
Sander de Smalen 137459aff6 [AArch64][SME] Disable (SLP|Loop)Vectorizer when function may be executed in streaming mode.
When the SME attributes tell that a function is or may be executed in Streaming
SVE mode, we currently need to be conservative and disable _any_ vectorization
(fixed or scalable) because the code-generator does not yet support generating
streaming-compatible code.

Scalable auto-vec will be gradually enabled in the future when we have
confidence that the loop-vectorizer won't use any SVE or NEON instructions
that are illegal in Streaming SVE mode.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D135950
2022-10-19 16:42:20 +00:00
Phoebe Wang bc1819389f [X86][RFC] Using `__bf16` for AVX512_BF16 intrinsics
This is an alternative of D120395 and D120411.

Previously we use `__bfloat16` as a typedef of `unsigned short`. The
name may give user an impression it is a brand new type to represent
BF16. So that they may use it in arithmetic operations and we don't have
a good way to block it.

To solve the problem, we introduced `__bf16` to X86 psABI and landed the
support in Clang by D130964. Now we can solve the problem by switching
intrinsics to the new type.

Reviewed By: LuoYuanke, RKSimon

Differential Revision: https://reviews.llvm.org/D132329
2022-10-19 23:47:04 +08:00
Jay Foad f0ca946bf9 [AMDGPU] New helper function SIInsertWaitcnts::getVmemWaitEventType
This just commons up and simplifies some logic that was repeated in
SIInsertWaitcnts::updateEventWaitcntAfter. NFCI.

Differential Revision: https://reviews.llvm.org/D136253
2022-10-19 16:22:50 +01:00
Joe Nash ad6698562c [AMDGPU] V_LDEXP_F16 encoding fix and doc update.
The amdgcn.ldexp.* intrinsics take an i32 value as src1.
The V_LDEXP_F16 instruction considers src1 an f16 operand, and therefore
src1 is implicitly truncated to 16 bits when lowering to that instruction from the
intrinsic. This is unlikely to result in an error in practice
because values that large are not useful.

The operand class of src1 in the True16 version of the instruction has
been corrected to encode correctly on GFX11.

Reviewed By: foad, rampitec

Differential Revision: https://reviews.llvm.org/D136195
2022-10-19 09:52:53 -04:00
Jay Foad ea09a426a9 [AMDGPU] Assume getDefIgnoringCopies will succeed. NFC.
getDefIgnoringCopies and getSrcRegIgnoringCopies should not fail on
valid MIR, so don't bother to check for failure.

Differential Revision: https://reviews.llvm.org/D136238
2022-10-19 11:10:00 +01:00
Caroline Concatto 579ca5e7e1 [AArch64] Replace sme-i64 by sme-i16i64 and sme-f64 by sme-f64f64
The names in developer.arm for these SME features are:
  HaveSMEI16I64 and HaveSMEF64F64
so the new flag names are consistent with the documentation page

Reviewed By: sdesmalen, c-rhodes

Differential Revision: https://reviews.llvm.org/D135974
2022-10-19 10:56:46 +01:00
Juan Manuel MARTINEZ CAAMAÑO bb24b2c610 [AMDGPU][Backend] Fix user-after-free in AMDGPUReleaseVGPRs::isLastVGPRUseVMEMStore
Reviewed By: jpages, arsenm

Differential Revision: https://reviews.llvm.org/D134641
2022-10-19 04:38:16 -05:00
luxufan 82c820b95c [RISCV] Enable the LocalStackSlotAllocation pass support
For RISC-V, load/store(exclude vector load/store) instructions only
has a 12 bit immediate operand. If the offset is out-of-range, it
must make use of a temp register to make up this offset. If between
these offsets, they have a small(IsInt<12>) relative offset,
LocalStackSlotAllocation pass can find a value as frame base register's
value, and replace the origin offset with this register's value plus
the relative offset.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98101
2022-10-19 16:15:14 +08:00
Freddy Ye 3ee58e2f35 [X86] Add WRMSRNS instructions.
For more details about these instructions, please refer to the latest ISE document: https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D135935
2022-10-19 13:04:11 +08:00
Craig Topper 7a4e56acac [RISCV] Add an early out to lowerVECTOR_SHUFFLEAsVSlidedown. NFC
If Mask[0] is 0, then we're never going to match a slidedown. If
we get through the for loop, then it's an identity mask which should
have already been optimized out. Otherwise it's some non-contiguous
mask that will fail out of the lop. Might as well not bother entering
the loop.
2022-10-18 21:35:15 -07:00
Freddy Ye e3df4ba9d2 [X86] Add MSRLIST instructions.
For more details about these instructions, please refer to the latest ISE document: https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html

Reviewed By: skan, RKSimon

Differential Revision: https://reviews.llvm.org/D135934
2022-10-19 10:35:42 +08:00
Weining Lu 771aee91c8 Reland "[LoongArch] Fix codegen of atomicrmw nand"
Fix invalid RISCV-like MI being emitted for performing the `not`
operation: the LoongArch `xori` zero-extends the immediate, hence is
not equivalent to RISCV `xori`. The LoongArch `not` is a `nor` with
zero.

Patch by lrzlin (Lin Runze).

Differential Revision: https://reviews.llvm.org/D136021
2022-10-19 10:05:35 +08:00
Chen Zheng df9d60af1f [PowerPC] handle more than two predecessors loop header in ctrloop pass
After ISEL, the "valid" loop header which has two predecessors
(one is preheader and the other one is latch) may be transformed
to have more than two predecessors by some optimizations, like tail
duplicator, if the old header's successor(will be changed to new
header) is a sub loop.

The predecessors of the new loop header are preheader, loop latch
and the loop latch(es) of the sub loop(old header's successor).

Before the patch, ctrloop pass assumes two predecessors for candidate
loop header. This patch fixes this case.

Reviewed By: lkail

Differential Revision: https://reviews.llvm.org/D135846
2022-10-19 01:11:58 +00:00
Weining Lu 0f374ca5cd Revert "[LoongArch] Fix codegen of atomicrmw nand"
This reverts commit 9572406bbc.

The author name is wrong.
2022-10-19 07:56:23 +08:00
Eli Friedman d6481dc88c [AArch64][Windows] Add MC support for save_any_reg.
Representing this as 12 separate operations is a bit ugly, but
trying to represent the different modes using a bitfield seemed worse.

Differential Revision: https://reviews.llvm.org/D135417
2022-10-18 11:45:27 -07:00
Krzysztof Parzyszek 6a8cfe9a72 [Hexagon] Use shifts by scalar for funnel shifts by scalar
HVX has vector shifts by a scalar register. Use those in the expansions
of funnel shifts where profitable.
2022-10-18 09:49:17 -07:00
Chris Bieneman 6e05c8dfc8 [DX] Create globals for DXContainer parts
DXContainer files have a handful of sections that need to be written.
This adds a pass to write the section data into IR globals, and writes
the shader flag data into a global.

The test cases here verify that the shader flags are correctly written
from the IR into the global and emitted to the DXContainer.

This change also fixes a bug in the MCDXContainerWriter, where the size
of the dxbc::ProgramHeader was not being included in the part offset
calcuations. This is verified to be working by the new testcases where
obj2yaml can properly dump part data for parts after the DXIL part.

Resolves issue #57742 (https://github.com/llvm/llvm-project/issues/57742)

Reviewed By: python3kgae

Differential Revision: https://reviews.llvm.org/D135793
2022-10-18 11:48:08 -05:00
Mingming Liu 34d18fd241 [AArch64] Enhance bit-field-positioning op matcher to see through 'any_extend' for pattern 'and(any_extend(shl(val, N)), shifted-mask)'
Before this patch (and refactor patch D135843), isBitfieldPositioningOp won't handle "and(any_extend(shl(val, N), shifted-mask)" (bail out if AND op is not SHL)

After this patch, isBitfieldPositioningOp will see through "any_extend" to find "shl" to find possible bit-field-positioning nodes.

https://gcc.godbolt.org/z/3ncGKbGW6 is a four-liner LLVM IR that could be optimized to UBFIZ (see added test case test_and_extended_shift_with_imm in llvm/test/CodeGen/AArch64/bitfield-insert.ll). One existing test case also improves.

Differential Revision: https://reviews.llvm.org/D135852
2022-10-18 09:07:14 -07:00
Han-Kuan Chen 615af94dc2 [RISCV] Lower VECTOR_SHUFFLE to VSLIDEDOWN_VL.
Differential Revision: https://reviews.llvm.org/D136136
2022-10-18 08:58:39 -07:00
Anton Sidorenko 1978b4d968 [MachineCombiner][RISCV] Enable MachineCombiner for RISCV
Initial implementation to match basic FP reassociation patterns.

Differential Revision: https://reviews.llvm.org/D135264
2022-10-18 18:56:32 +03:00
Krzysztof Parzyszek 9fde8e907b [Hexagon] Fix MULHS lowering for HVX v60
The carry bit from an intermediate addition was not properly propagated.
For example mulhs(7fffffff, 7fffffff) was evaluated as 3ffeffff, while
the correct result is 3fffffff.
2022-10-18 07:54:38 -07:00
Anton Afanasyev e175f99c49 Revert "[MachineCombiner][RISCV] Enable MachineCombiner for RISCV"
This reverts commit 3112cf3b00.
Test breakage: https://lab.llvm.org/buildbot/#/builders/16/builds/36631
2022-10-18 15:57:11 +03:00
Weining Lu 9572406bbc [LoongArch] Fix codegen of atomicrmw nand
Fix invalid RISCV-like MI being emitted for performing the `not`
operation: the LoongArch `xori` zero-extends the immediate, hence is
not equivalent to RISCV `xori`. The LoongArch `not` is a `nor` with
zero.

Differential Revision: https://reviews.llvm.org/D136021
2022-10-18 20:39:20 +08:00
Anton Sidorenko 3112cf3b00 [MachineCombiner][RISCV] Enable MachineCombiner for RISCV
Initial implementation to match basic FP reassociation patterns.

Differential Revision: https://reviews.llvm.org/D135264
2022-10-18 15:31:03 +03:00
LiaoChunyu 7b970290c0 [RISCV] Optimize SELECT_CC when the true value of select is Constant
(select (setcc lhs, rhs, CC), constant, falsev) -> (select (setcc lhs, rhs, InverseCC), falsev, constant)

This patch removes unnecessary copies

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D129757
2022-10-18 09:24:17 +08:00
Koakuma d3fcbee10d [SPARC] Make calls to function with big return values work
Implement CanLowerReturn and associated CallingConv changes for SPARC/SPARC64.

In particular, for SPARC64 there's new `RetCC_Sparc64_*` functions that handles the return case of the calling convention.
It uses the same analysis as `CC_Sparc64_*` family of funtions, but fails if the return value doesn't fit into the return registers.

This makes calls to functions with big return values converted to an sret function as expected, instead of crashing LLVM.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D132465
2022-10-18 00:01:55 +00:00
Xiang Li 13163dd8ab [HLSL] CodeGen hlsl resource binding.
''register(ID, space)'' like register(t3, space1) will be translated into
i32 3, i32 1 as the last 2 operands for resource annotation metadata.

NamedMetadata for CBuffers and SRVs are added as "hlsl.srvs" and "hlsl.cbufs".

Reviewed By: beanz

Differential Revision: https://reviews.llvm.org/D130951
2022-10-17 14:29:19 -07:00
Craig Topper 2b32e4f98b [RISCV] Add basic support for the sifive-7-series short forward branch optimization.
sifive-7-series has macrofusion support to convert a branch over
a single instruction into a conditional instruction. This can be
an improvement if the branch is hard to predict.

This patch adds support for the most basic case, a branch over a
move instruction. This is implemented as a pseudo instruction so
we can hide the control flow until all code motion passes complete.

I've disabled a recent select optimization if this feature is enabled
in the subtarget.

Related gcc patch for the same optimization https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg211045.html

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D135814
2022-10-17 13:56:22 -07:00
Han Zhu d0d48a91f8 [X86] Lower vector interleave into unpck and perm
[This Godbolt link](https://godbolt.org/z/s17Kv1s9T) shows different codegen between clang and gcc for a transpose operation.

clang result:
```
        vmovdqu xmm0, xmmword ptr [rcx + rax]
        vmovdqu xmm1, xmmword ptr [rcx + rax + 16]
        vmovdqu xmm2, xmmword ptr [r8 + rax]
        vmovdqu xmm3, xmmword ptr [r8 + rax + 16]
        vpunpckhbw      xmm4, xmm2, xmm0
        vpunpcklbw      xmm0, xmm2, xmm0
        vpunpcklbw      xmm2, xmm3, xmm1
        vpunpckhbw      xmm1, xmm3, xmm1
        vmovdqu xmmword ptr [rdi + 2*rax + 48], xmm1
        vmovdqu xmmword ptr [rdi + 2*rax + 32], xmm2
        vmovdqu xmmword ptr [rdi + 2*rax], xmm0
        vmovdqu xmmword ptr [rdi + 2*rax + 16], xmm4
```
gcc result:
```
        vmovdqu ymm3, YMMWORD PTR [rdi+rax]
        vpunpcklbw      ymm1, ymm3, YMMWORD PTR [rsi+rax]
        vpunpckhbw      ymm0, ymm3, YMMWORD PTR [rsi+rax]
        vperm2i128      ymm2, ymm1, ymm0, 32
        vperm2i128      ymm1, ymm1, ymm0, 49
        vmovdqu YMMWORD PTR [rcx+rax*2], ymm2
        vmovdqu YMMWORD PTR [rcx+32+rax*2], ymm1
```
clang's code is roughly 15% slower than gcc's when evaluated on an internal compression benchmark.

The loop vectorizer generates the following shufflevector intrinsic:
```
%interleaved.vec = shufflevector <32 x i8> %a, <32 x i8> %b, <64 x i32> <i32 0, i32 32, i32 1, i32 33, i32 2, i32 34, i32 3, i32 35, i32 4, i32 36, i32 5, i32 37, i32 6, i32 38, i32 7, i32 39, i32 8, i32 40, i32 9, i32 41, i32 10, i32 42, i32 11, i32 43, i32 12, i32 44, i32 13, i32 45, i32 14, i32 46, i32 15, i32 47, i32 16, i32 48, i32 17, i32 49, i32 18, i32 50, i32 19, i32 51, i32 20, i32 52, i32 21, i32 53, i32 22, i32 54, i32 23, i32 55, i32 24, i32 56, i32 25, i32 57, i32 26, i32 58, i32 27, i32 59, i32 28, i32 60, i32 29, i32 61, i32 30, i32 62, i32 31, i32 63>
```
which is lowered to SelectionDAG:
```
t2: v32i8,ch = CopyFromReg t0, Register:v32i8 %0
t6: v64i8 = concat_vectors t2, undef:v32i8
t4: v32i8,ch = CopyFromReg t0, Register:v32i8 %1
t7: v64i8 = concat_vectors t4, undef:v32i8
t8: v64i8 = vector_shuffle<0,64,1,65,2,66,3,67,4,68,5,69,6,70,7,71,8,72,9,73,10,74,11,75,12,76,13,77,14,78,15,79,16,80,17,81,18,82,19,83,20,84,21,85,22,86,23,87,24,88,25,89,26,90,27,91,28,92,29,93,30,94,31,95> t6, t7
```

So far this `vector_shuffle` is good enough for us to pattern-match and transform, but as we go down the SelectionDAG pipeline, it got split into smaller shuffles. During dagcombine1, the shuffle is split by `foldShuffleOfConcatUndefs`.
```
  // shuffle (concat X, undef), (concat Y, undef), Mask -->
  // concat (shuffle X, Y, Mask0), (shuffle X, Y, Mask1)
t2: v32i8,ch = CopyFromReg t0, Register:v32i8 %0
t4: v32i8,ch = CopyFromReg t0, Register:v32i8 %1
t19: v32i8 = vector_shuffle<0,32,1,33,2,34,3,35,4,36,5,37,6,38,7,39,8,40,9,41,10,42,11,43,12,44,13,45,14,46,15,47> t2, t4
t15: ch,glue = CopyToReg t0, Register:v32i8 $ymm0, t19
t20: v32i8 = vector_shuffle<16,48,17,49,18,50,19,51,20,52,21,53,22,54,23,55,24,56,25,57,26,58,27,59,28,60,29,61,30,62,31,63> t2, t4
t17: ch,glue = CopyToReg t15, Register:v32i8 $ymm1, t20, t15:1
```

With `foldShuffleOfConcatUndefs` commented out, the vector is still split later by the type legalizer, which comes after dagcombine1, because v64i8 is not a legal type in AVX2 (64 * 8 = 512 bits while ymm = 256 bits). There doesn't seem to be a good way to avoid this split. Lowering the `vector_shuffle` into unpck and perm during dagcombine1 is too early. Therefore, although somewhat inconvenient, we decided to go with pattern-matching a pair vector shuffles later in the SelectionDAG pipeline, as part of `lowerV32I8Shuffle`.

The code looks at the two operands of the first shuffle it encounters, iterates through the users of the operands, and tries to find two shuffles that are consecutive interleaves. Once the pattern is found, it lowers them into unpcks and perms. It returns the perm for the shuffle that's currently being lowered (have ISel modify the DAG), and replaces the other shuffle in place.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D134477
2022-10-17 11:39:27 -07:00
Fangrui Song 5d3139aef1 [AArch64] Fix warnings 2022-10-17 16:58:52 +00:00
Mingming Liu db0286a096 [AArch64]Enhance 'isBitfieldPositioningOp' to find pattern (shl(and(val,mask), N).
Before this patch (and D135844)

- Given DAG node shl(op, N), isBitfieldPositioningOp uses (optionally shifted [1] ) op as the Src (least significant bits of Src are inserted into DstLSB of Dst node).

After this patch

- If op is and(val, mask), isBitfieldPositioningOp tries to see through and and find if val is a simpler source than op.

It helps in a similar (probably symmetric) way how isSeveralBitsExtractOpFromShr [2] optimizes isBitfieldExtractOpFromShr

Existing test cases are improved without regressions.

[1] cbd8464595/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp (L2546)
[2] cbd8464595/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp (L2057)

Differential Revision: https://reviews.llvm.org/D135850
2022-10-17 09:01:29 -07:00
Mingming Liu 45cadb4bd3 [AArch64][NFC]Refactor 'isBitfieldPositioningOp' so that DAG nodes with different Opcode are handled with separate helper functions.
Using different helper functions for DAG nodes with different Opcode allows specialization.

- 'isBitfieldExtractOp' [1] shows how specialization based on Opcode could catch more patterns.
- The refactor paves the way (e.g., makes diff clearer) for enhancement in {D135844,D135850,D135852}

[1] cbd8464595/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp (L2163-L2202)

Differential Revision: https://reviews.llvm.org/D135843
2022-10-17 08:08:48 -07:00
Nicola Lancellotti 43fe14c056 [AArch64] Canonicalize ZERO_EXTEND to VSELECT
Differential Revision: https://reviews.llvm.org/D135596
2022-10-17 15:42:46 +01:00
Kazu Hirata 5ea3155565 [llvm] Use llvm::find (NFC) 2022-10-16 16:21:00 -07:00
Kazu Hirata 7820a30a1b [AMDGPU] Use llvm::any_of (NFC) 2022-10-16 09:19:09 -07:00
Amara Emerson 13792ba417 [AArch64][GlobalISel] When lowering signext i1 parameters, don't zero-extend to s8 first.
Fixes https://github.com/llvm/llvm-project/issues/57181
2022-10-15 20:25:43 -07:00
wanglei 506e936871 [LoongArch] Fix wrong VariantKind for MO_GOT_PC_{HI/LO} flags
Differential Revision: https://reviews.llvm.org/D135946
2022-10-15 17:45:08 +08:00
Kazushi (Jam) Marukawa 0278c9ceb6 [VE] Change the way to lower select
Change to use VEISD::CMOV in combineSelect for better optimization.
Support VEISD::CMOV in combineTRUNCATE also to optimize trancate.
Merge functions to handle condition codes to VE.h.  And add basic
CMOV patterns to VEInstrInfo.td.  Update regression tests also.

Reviewed By: efocht

Differential Revision: https://reviews.llvm.org/D135878
2022-10-15 08:49:36 +09:00
Krzysztof Parzyszek fb063ea2ea [Hexagon] Clean up leftover instructions in HvxIdioms
Quick and dirty fix, because this is causing one builder to fail.
2022-10-14 16:45:03 -07:00
Krzysztof Parzyszek 6cb2a02a38 [Hexagon] Report if changes were made in HvxIdioms pass
This should fix
```
Pass modifies its input and doesn't report it: Hexagon Vector Combine
Pass modifies its input and doesn't report it UNREACHABLE executed at
[...hecks-debian/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1436!
```
2022-10-14 15:46:33 -07:00
Krzysztof Parzyszek 361a27c155 [Hexagon] Recognize idioms for fixed-point vector multiplication
Recognize Q.15*Q.15 and Q.31*Q.31, with and without rounding.
2022-10-14 15:22:25 -07:00
Martin Storsjö 6eb205b257 Reapply [AArch64] Fix aligning the stack after calling __chkstk
Whenever a call to __chkstk was made, the frame lowering previously
omitted the aligning (as NumBytes was reset to zero before doing
alignment).

This fixes https://github.com/llvm/llvm-project/issues/56182.

The initial version of this produced invalid code for small
functions with no local stack allocations, if those functions
were marked with the "stackrealign" attribute. If building
with -mstack-alignment=16 (which otherwise mostly would be a
no-op), this attribute is added on the main function.

Differential Revision: https://reviews.llvm.org/D135687
2022-10-15 00:40:13 +03:00
Krzysztof Parzyszek b465a98316 [Hexagon] Fix isTypeForHVX for vector predicates
HexagonSubtarget::isTypeFixHVX would stop breaking the type up when it
reached 64 bits in width. HVX vector predicates can be shorter than that,
for example <32 x i1> would have a bitwidth of 32, and it's still a valid
HVX type.
2022-10-14 14:38:41 -07:00
Krzysztof Parzyszek 705e77abed [Hexagon] Lower funnel shifts for HVX
HVX v62+ has bidirectional shifts, which do not mask the shift amount to
the bit width. Instead, the shift amount is sign-extended from the log(BW)
bit value, and a negative value causes a shift in the other direction.
For the shift amount being -log(BW), this reversed shift will shift all
bits out, inserting 0s or sign bits depending on the type and direction.
2022-10-14 14:13:18 -07:00