llvm-project

Commit Graph

Author	SHA1	Message	Date
Krzysztof Parzyszek	e5d9ab08c3	[Hexagon] Fix insertion point for pointer difference calculation HVC::calculatePointerDifference inserts temporary instructions for simplification, and calulation of known bits. These instructions were inserted at the end of a basic block (after the terminator), which caused BB->getTerminator() to return nullptr. This, in turn, caused a crash when a PHI instruction was examined in computeKnownBits.	2022-10-19 14:23:39 -07:00
Michal Paszkowski	6beac40fe4	[SPIR-V] Add get_image_num_mip_levels implementation Differential Revision: https://reviews.llvm.org/D135904	2022-10-19 22:29:16 +02:00
Michal Paszkowski	5fb4a05148	[SPIR-V] Add atomic_init and fix atomic explicit lowering Differential Revision: https://reviews.llvm.org/D135902	2022-10-19 22:13:29 +02:00
Chris Bieneman	607be386e7	[DX] Fix missing preserved analysis The ShaderFlagsAnalysisWrapper needs to be marked to preserve all analyssis. Fixes #58474 (https://github.com/llvm/llvm-project/issues/58474)	2022-10-19 12:11:03 -05:00
Sander de Smalen	36864d47d6	[AArch64] Fix minor issue introduced in D135950. The Key for the SubtargetMap had the StreamingSVEModeDisabled in the wrong place. This change is non-functional, since the string (key) is still unique.	2022-10-19 17:01:41 +00:00
Caroline Concatto	2ecbe8c38c	[AArch64] SME2 Single-multi vector ternary int/FP 2 and 4 registers This patch adds the assembly/disassembly for the following instructions: For INT: ADD(array results, multiple and single vector): Add replicated single vector to multi-vector with ZA array vector results. SUB(array results, multiple and single vector): Subtract replicated single vector from multi-vector with ZA array vector results. For FP: FMLA (multiple and single vector): Multi-vector floating-point fused multiply-add by vector. FMLS (multiple and single vector): Multi-vector floating-point multiply-subtract long by vector. The reference can be found here: https://developer.arm.com/documentation/ddi0602/2022-09 The Matriz Operand has 2 new sizes 32(.s) and 64(.d) bits (MatrixOp32 and MatrixOp64) Depends on: D135448 Depends on: D135952 Differential Revision: https://reviews.llvm.org/D135455	2022-10-19 17:49:48 +01:00
Sander de Smalen	137459aff6	[AArch64][SME] Disable (SLP\|Loop)Vectorizer when function may be executed in streaming mode. When the SME attributes tell that a function is or may be executed in Streaming SVE mode, we currently need to be conservative and disable _any_ vectorization (fixed or scalable) because the code-generator does not yet support generating streaming-compatible code. Scalable auto-vec will be gradually enabled in the future when we have confidence that the loop-vectorizer won't use any SVE or NEON instructions that are illegal in Streaming SVE mode. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D135950	2022-10-19 16:42:20 +00:00
Phoebe Wang	bc1819389f	[X86][RFC] Using `__bf16` for AVX512_BF16 intrinsics This is an alternative of D120395 and D120411. Previously we use `__bfloat16` as a typedef of `unsigned short`. The name may give user an impression it is a brand new type to represent BF16. So that they may use it in arithmetic operations and we don't have a good way to block it. To solve the problem, we introduced `__bf16` to X86 psABI and landed the support in Clang by D130964. Now we can solve the problem by switching intrinsics to the new type. Reviewed By: LuoYuanke, RKSimon Differential Revision: https://reviews.llvm.org/D132329	2022-10-19 23:47:04 +08:00
Jay Foad	f0ca946bf9	[AMDGPU] New helper function SIInsertWaitcnts::getVmemWaitEventType This just commons up and simplifies some logic that was repeated in SIInsertWaitcnts::updateEventWaitcntAfter. NFCI. Differential Revision: https://reviews.llvm.org/D136253	2022-10-19 16:22:50 +01:00
Joe Nash	ad6698562c	[AMDGPU] V_LDEXP_F16 encoding fix and doc update. The amdgcn.ldexp.* intrinsics take an i32 value as src1. The V_LDEXP_F16 instruction considers src1 an f16 operand, and therefore src1 is implicitly truncated to 16 bits when lowering to that instruction from the intrinsic. This is unlikely to result in an error in practice because values that large are not useful. The operand class of src1 in the True16 version of the instruction has been corrected to encode correctly on GFX11. Reviewed By: foad, rampitec Differential Revision: https://reviews.llvm.org/D136195	2022-10-19 09:52:53 -04:00
Jay Foad	ea09a426a9	[AMDGPU] Assume getDefIgnoringCopies will succeed. NFC. getDefIgnoringCopies and getSrcRegIgnoringCopies should not fail on valid MIR, so don't bother to check for failure. Differential Revision: https://reviews.llvm.org/D136238	2022-10-19 11:10:00 +01:00
Caroline Concatto	579ca5e7e1	[AArch64] Replace sme-i64 by sme-i16i64 and sme-f64 by sme-f64f64 The names in developer.arm for these SME features are: HaveSMEI16I64 and HaveSMEF64F64 so the new flag names are consistent with the documentation page Reviewed By: sdesmalen, c-rhodes Differential Revision: https://reviews.llvm.org/D135974	2022-10-19 10:56:46 +01:00
Juan Manuel MARTINEZ CAAMAÑO	bb24b2c610	[AMDGPU][Backend] Fix user-after-free in AMDGPUReleaseVGPRs::isLastVGPRUseVMEMStore Reviewed By: jpages, arsenm Differential Revision: https://reviews.llvm.org/D134641	2022-10-19 04:38:16 -05:00
luxufan	82c820b95c	[RISCV] Enable the LocalStackSlotAllocation pass support For RISC-V, load/store(exclude vector load/store) instructions only has a 12 bit immediate operand. If the offset is out-of-range, it must make use of a temp register to make up this offset. If between these offsets, they have a small(IsInt<12>) relative offset, LocalStackSlotAllocation pass can find a value as frame base register's value, and replace the origin offset with this register's value plus the relative offset. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D98101	2022-10-19 16:15:14 +08:00
Freddy Ye	3ee58e2f35	[X86] Add WRMSRNS instructions. For more details about these instructions, please refer to the latest ISE document: https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D135935	2022-10-19 13:04:11 +08:00
Craig Topper	7a4e56acac	[RISCV] Add an early out to lowerVECTOR_SHUFFLEAsVSlidedown. NFC If Mask[0] is 0, then we're never going to match a slidedown. If we get through the for loop, then it's an identity mask which should have already been optimized out. Otherwise it's some non-contiguous mask that will fail out of the lop. Might as well not bother entering the loop.	2022-10-18 21:35:15 -07:00
Freddy Ye	e3df4ba9d2	[X86] Add MSRLIST instructions. For more details about these instructions, please refer to the latest ISE document: https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html Reviewed By: skan, RKSimon Differential Revision: https://reviews.llvm.org/D135934	2022-10-19 10:35:42 +08:00
Weining Lu	771aee91c8	Reland "[LoongArch] Fix codegen of atomicrmw nand" Fix invalid RISCV-like MI being emitted for performing the `not` operation: the LoongArch `xori` zero-extends the immediate, hence is not equivalent to RISCV `xori`. The LoongArch `not` is a `nor` with zero. Patch by lrzlin (Lin Runze). Differential Revision: https://reviews.llvm.org/D136021	2022-10-19 10:05:35 +08:00
Chen Zheng	df9d60af1f	[PowerPC] handle more than two predecessors loop header in ctrloop pass After ISEL, the "valid" loop header which has two predecessors (one is preheader and the other one is latch) may be transformed to have more than two predecessors by some optimizations, like tail duplicator, if the old header's successor(will be changed to new header) is a sub loop. The predecessors of the new loop header are preheader, loop latch and the loop latch(es) of the sub loop(old header's successor). Before the patch, ctrloop pass assumes two predecessors for candidate loop header. This patch fixes this case. Reviewed By: lkail Differential Revision: https://reviews.llvm.org/D135846	2022-10-19 01:11:58 +00:00
Weining Lu	0f374ca5cd	Revert "[LoongArch] Fix codegen of atomicrmw nand" This reverts commit `9572406bbc`. The author name is wrong.	2022-10-19 07:56:23 +08:00
Eli Friedman	d6481dc88c	[AArch64][Windows] Add MC support for save_any_reg. Representing this as 12 separate operations is a bit ugly, but trying to represent the different modes using a bitfield seemed worse. Differential Revision: https://reviews.llvm.org/D135417	2022-10-18 11:45:27 -07:00
Krzysztof Parzyszek	6a8cfe9a72	[Hexagon] Use shifts by scalar for funnel shifts by scalar HVX has vector shifts by a scalar register. Use those in the expansions of funnel shifts where profitable.	2022-10-18 09:49:17 -07:00
Chris Bieneman	6e05c8dfc8	[DX] Create globals for DXContainer parts DXContainer files have a handful of sections that need to be written. This adds a pass to write the section data into IR globals, and writes the shader flag data into a global. The test cases here verify that the shader flags are correctly written from the IR into the global and emitted to the DXContainer. This change also fixes a bug in the MCDXContainerWriter, where the size of the dxbc::ProgramHeader was not being included in the part offset calcuations. This is verified to be working by the new testcases where obj2yaml can properly dump part data for parts after the DXIL part. Resolves issue #57742 (https://github.com/llvm/llvm-project/issues/57742) Reviewed By: python3kgae Differential Revision: https://reviews.llvm.org/D135793	2022-10-18 11:48:08 -05:00
Mingming Liu	34d18fd241	[AArch64] Enhance bit-field-positioning op matcher to see through 'any_extend' for pattern 'and(any_extend(shl(val, N)), shifted-mask)' Before this patch (and refactor patch D135843), isBitfieldPositioningOp won't handle "and(any_extend(shl(val, N), shifted-mask)" (bail out if AND op is not SHL) After this patch, isBitfieldPositioningOp will see through "any_extend" to find "shl" to find possible bit-field-positioning nodes. https://gcc.godbolt.org/z/3ncGKbGW6 is a four-liner LLVM IR that could be optimized to UBFIZ (see added test case test_and_extended_shift_with_imm in llvm/test/CodeGen/AArch64/bitfield-insert.ll). One existing test case also improves. Differential Revision: https://reviews.llvm.org/D135852	2022-10-18 09:07:14 -07:00
Han-Kuan Chen	615af94dc2	[RISCV] Lower VECTOR_SHUFFLE to VSLIDEDOWN_VL. Differential Revision: https://reviews.llvm.org/D136136	2022-10-18 08:58:39 -07:00
Anton Sidorenko	1978b4d968	[MachineCombiner][RISCV] Enable MachineCombiner for RISCV Initial implementation to match basic FP reassociation patterns. Differential Revision: https://reviews.llvm.org/D135264	2022-10-18 18:56:32 +03:00
Krzysztof Parzyszek	9fde8e907b	[Hexagon] Fix MULHS lowering for HVX v60 The carry bit from an intermediate addition was not properly propagated. For example mulhs(7fffffff, 7fffffff) was evaluated as 3ffeffff, while the correct result is 3fffffff.	2022-10-18 07:54:38 -07:00
Anton Afanasyev	e175f99c49	Revert "[MachineCombiner][RISCV] Enable MachineCombiner for RISCV" This reverts commit `3112cf3b00`. Test breakage: https://lab.llvm.org/buildbot/#/builders/16/builds/36631	2022-10-18 15:57:11 +03:00
Weining Lu	9572406bbc	[LoongArch] Fix codegen of atomicrmw nand Fix invalid RISCV-like MI being emitted for performing the `not` operation: the LoongArch `xori` zero-extends the immediate, hence is not equivalent to RISCV `xori`. The LoongArch `not` is a `nor` with zero. Differential Revision: https://reviews.llvm.org/D136021	2022-10-18 20:39:20 +08:00
Anton Sidorenko	3112cf3b00	[MachineCombiner][RISCV] Enable MachineCombiner for RISCV Initial implementation to match basic FP reassociation patterns. Differential Revision: https://reviews.llvm.org/D135264	2022-10-18 15:31:03 +03:00
LiaoChunyu	7b970290c0	[RISCV] Optimize SELECT_CC when the true value of select is Constant (select (setcc lhs, rhs, CC), constant, falsev) -> (select (setcc lhs, rhs, InverseCC), falsev, constant) This patch removes unnecessary copies Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D129757	2022-10-18 09:24:17 +08:00
Koakuma	d3fcbee10d	[SPARC] Make calls to function with big return values work Implement CanLowerReturn and associated CallingConv changes for SPARC/SPARC64. In particular, for SPARC64 there's new `RetCC_Sparc64_` functions that handles the return case of the calling convention. It uses the same analysis as `CC_Sparc64_` family of funtions, but fails if the return value doesn't fit into the return registers. This makes calls to functions with big return values converted to an sret function as expected, instead of crashing LLVM. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D132465	2022-10-18 00:01:55 +00:00
Xiang Li	13163dd8ab	[HLSL] CodeGen hlsl resource binding. ''register(ID, space)'' like register(t3, space1) will be translated into i32 3, i32 1 as the last 2 operands for resource annotation metadata. NamedMetadata for CBuffers and SRVs are added as "hlsl.srvs" and "hlsl.cbufs". Reviewed By: beanz Differential Revision: https://reviews.llvm.org/D130951	2022-10-17 14:29:19 -07:00
Craig Topper	2b32e4f98b	[RISCV] Add basic support for the sifive-7-series short forward branch optimization. sifive-7-series has macrofusion support to convert a branch over a single instruction into a conditional instruction. This can be an improvement if the branch is hard to predict. This patch adds support for the most basic case, a branch over a move instruction. This is implemented as a pseudo instruction so we can hide the control flow until all code motion passes complete. I've disabled a recent select optimization if this feature is enabled in the subtarget. Related gcc patch for the same optimization https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg211045.html Reviewed By: reames Differential Revision: https://reviews.llvm.org/D135814	2022-10-17 13:56:22 -07:00
Han Zhu	d0d48a91f8	[X86] Lower vector interleave into unpck and perm [This Godbolt link](https://godbolt.org/z/s17Kv1s9T) shows different codegen between clang and gcc for a transpose operation. clang result: ``` vmovdqu xmm0, xmmword ptr [rcx + rax] vmovdqu xmm1, xmmword ptr [rcx + rax + 16] vmovdqu xmm2, xmmword ptr [r8 + rax] vmovdqu xmm3, xmmword ptr [r8 + rax + 16] vpunpckhbw xmm4, xmm2, xmm0 vpunpcklbw xmm0, xmm2, xmm0 vpunpcklbw xmm2, xmm3, xmm1 vpunpckhbw xmm1, xmm3, xmm1 vmovdqu xmmword ptr [rdi + 2rax + 48], xmm1 vmovdqu xmmword ptr [rdi + 2rax + 32], xmm2 vmovdqu xmmword ptr [rdi + 2rax], xmm0 vmovdqu xmmword ptr [rdi + 2rax + 16], xmm4 ``` gcc result: ``` vmovdqu ymm3, YMMWORD PTR [rdi+rax] vpunpcklbw ymm1, ymm3, YMMWORD PTR [rsi+rax] vpunpckhbw ymm0, ymm3, YMMWORD PTR [rsi+rax] vperm2i128 ymm2, ymm1, ymm0, 32 vperm2i128 ymm1, ymm1, ymm0, 49 vmovdqu YMMWORD PTR [rcx+rax2], ymm2 vmovdqu YMMWORD PTR [rcx+32+rax2], ymm1 ``` clang's code is roughly 15% slower than gcc's when evaluated on an internal compression benchmark. The loop vectorizer generates the following shufflevector intrinsic: ``` %interleaved.vec = shufflevector <32 x i8> %a, <32 x i8> %b, <64 x i32> <i32 0, i32 32, i32 1, i32 33, i32 2, i32 34, i32 3, i32 35, i32 4, i32 36, i32 5, i32 37, i32 6, i32 38, i32 7, i32 39, i32 8, i32 40, i32 9, i32 41, i32 10, i32 42, i32 11, i32 43, i32 12, i32 44, i32 13, i32 45, i32 14, i32 46, i32 15, i32 47, i32 16, i32 48, i32 17, i32 49, i32 18, i32 50, i32 19, i32 51, i32 20, i32 52, i32 21, i32 53, i32 22, i32 54, i32 23, i32 55, i32 24, i32 56, i32 25, i32 57, i32 26, i32 58, i32 27, i32 59, i32 28, i32 60, i32 29, i32 61, i32 30, i32 62, i32 31, i32 63> ``` which is lowered to SelectionDAG: ``` t2: v32i8,ch = CopyFromReg t0, Register:v32i8 %0 t6: v64i8 = concat_vectors t2, undef:v32i8 t4: v32i8,ch = CopyFromReg t0, Register:v32i8 %1 t7: v64i8 = concat_vectors t4, undef:v32i8 t8: v64i8 = vector_shuffle<0,64,1,65,2,66,3,67,4,68,5,69,6,70,7,71,8,72,9,73,10,74,11,75,12,76,13,77,14,78,15,79,16,80,17,81,18,82,19,83,20,84,21,85,22,86,23,87,24,88,25,89,26,90,27,91,28,92,29,93,30,94,31,95> t6, t7 ``` So far this `vector_shuffle` is good enough for us to pattern-match and transform, but as we go down the SelectionDAG pipeline, it got split into smaller shuffles. During dagcombine1, the shuffle is split by `foldShuffleOfConcatUndefs`. ``` // shuffle (concat X, undef), (concat Y, undef), Mask --> // concat (shuffle X, Y, Mask0), (shuffle X, Y, Mask1) t2: v32i8,ch = CopyFromReg t0, Register:v32i8 %0 t4: v32i8,ch = CopyFromReg t0, Register:v32i8 %1 t19: v32i8 = vector_shuffle<0,32,1,33,2,34,3,35,4,36,5,37,6,38,7,39,8,40,9,41,10,42,11,43,12,44,13,45,14,46,15,47> t2, t4 t15: ch,glue = CopyToReg t0, Register:v32i8 $ymm0, t19 t20: v32i8 = vector_shuffle<16,48,17,49,18,50,19,51,20,52,21,53,22,54,23,55,24,56,25,57,26,58,27,59,28,60,29,61,30,62,31,63> t2, t4 t17: ch,glue = CopyToReg t15, Register:v32i8 $ymm1, t20, t15:1 ``` With `foldShuffleOfConcatUndefs` commented out, the vector is still split later by the type legalizer, which comes after dagcombine1, because v64i8 is not a legal type in AVX2 (64 * 8 = 512 bits while ymm = 256 bits). There doesn't seem to be a good way to avoid this split. Lowering the `vector_shuffle` into unpck and perm during dagcombine1 is too early. Therefore, although somewhat inconvenient, we decided to go with pattern-matching a pair vector shuffles later in the SelectionDAG pipeline, as part of `lowerV32I8Shuffle`. The code looks at the two operands of the first shuffle it encounters, iterates through the users of the operands, and tries to find two shuffles that are consecutive interleaves. Once the pattern is found, it lowers them into unpcks and perms. It returns the perm for the shuffle that's currently being lowered (have ISel modify the DAG), and replaces the other shuffle in place. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D134477	2022-10-17 11:39:27 -07:00
Fangrui Song	5d3139aef1	[AArch64] Fix warnings	2022-10-17 16:58:52 +00:00
Mingming Liu	db0286a096	[AArch64]Enhance 'isBitfieldPositioningOp' to find pattern (shl(and(val,mask), N). Before this patch (and D135844) - Given DAG node shl(op, N), isBitfieldPositioningOp uses (optionally shifted [1] ) op as the Src (least significant bits of Src are inserted into DstLSB of Dst node). After this patch - If op is and(val, mask), isBitfieldPositioningOp tries to see through and and find if val is a simpler source than op. It helps in a similar (probably symmetric) way how isSeveralBitsExtractOpFromShr [2] optimizes isBitfieldExtractOpFromShr Existing test cases are improved without regressions. [1] `cbd8464595/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp (L2546)` [2] `cbd8464595/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp (L2057)` Differential Revision: https://reviews.llvm.org/D135850	2022-10-17 09:01:29 -07:00
Mingming Liu	45cadb4bd3	[AArch64][NFC]Refactor 'isBitfieldPositioningOp' so that DAG nodes with different Opcode are handled with separate helper functions. Using different helper functions for DAG nodes with different Opcode allows specialization. - 'isBitfieldExtractOp' [1] shows how specialization based on Opcode could catch more patterns. - The refactor paves the way (e.g., makes diff clearer) for enhancement in {D135844,D135850,D135852} [1] `cbd8464595/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp (L2163-L2202)` Differential Revision: https://reviews.llvm.org/D135843	2022-10-17 08:08:48 -07:00
Nicola Lancellotti	43fe14c056	[AArch64] Canonicalize ZERO_EXTEND to VSELECT Differential Revision: https://reviews.llvm.org/D135596	2022-10-17 15:42:46 +01:00
Kazu Hirata	5ea3155565	[llvm] Use llvm::find (NFC)	2022-10-16 16:21:00 -07:00
Kazu Hirata	7820a30a1b	[AMDGPU] Use llvm::any_of (NFC)	2022-10-16 09:19:09 -07:00
Amara Emerson	13792ba417	[AArch64][GlobalISel] When lowering signext i1 parameters, don't zero-extend to s8 first. Fixes https://github.com/llvm/llvm-project/issues/57181	2022-10-15 20:25:43 -07:00
wanglei	506e936871	[LoongArch] Fix wrong VariantKind for MO_GOT_PC_{HI/LO} flags Differential Revision: https://reviews.llvm.org/D135946	2022-10-15 17:45:08 +08:00
Kazushi (Jam) Marukawa	0278c9ceb6	[VE] Change the way to lower select Change to use VEISD::CMOV in combineSelect for better optimization. Support VEISD::CMOV in combineTRUNCATE also to optimize trancate. Merge functions to handle condition codes to VE.h. And add basic CMOV patterns to VEInstrInfo.td. Update regression tests also. Reviewed By: efocht Differential Revision: https://reviews.llvm.org/D135878	2022-10-15 08:49:36 +09:00
Krzysztof Parzyszek	fb063ea2ea	[Hexagon] Clean up leftover instructions in HvxIdioms Quick and dirty fix, because this is causing one builder to fail.	2022-10-14 16:45:03 -07:00
Krzysztof Parzyszek	6cb2a02a38	[Hexagon] Report if changes were made in HvxIdioms pass This should fix ``` Pass modifies its input and doesn't report it: Hexagon Vector Combine Pass modifies its input and doesn't report it UNREACHABLE executed at [...hecks-debian/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1436! ```	2022-10-14 15:46:33 -07:00
Krzysztof Parzyszek	361a27c155	[Hexagon] Recognize idioms for fixed-point vector multiplication Recognize Q.15Q.15 and Q.31Q.31, with and without rounding.	2022-10-14 15:22:25 -07:00
Martin Storsjö	6eb205b257	Reapply [AArch64] Fix aligning the stack after calling __chkstk Whenever a call to __chkstk was made, the frame lowering previously omitted the aligning (as NumBytes was reset to zero before doing alignment). This fixes https://github.com/llvm/llvm-project/issues/56182. The initial version of this produced invalid code for small functions with no local stack allocations, if those functions were marked with the "stackrealign" attribute. If building with -mstack-alignment=16 (which otherwise mostly would be a no-op), this attribute is added on the main function. Differential Revision: https://reviews.llvm.org/D135687	2022-10-15 00:40:13 +03:00
Krzysztof Parzyszek	b465a98316	[Hexagon] Fix isTypeForHVX for vector predicates HexagonSubtarget::isTypeFixHVX would stop breaking the type up when it reached 64 bits in width. HVX vector predicates can be shorter than that, for example <32 x i1> would have a bitwidth of 32, and it's still a valid HVX type.	2022-10-14 14:38:41 -07:00
Krzysztof Parzyszek	705e77abed	[Hexagon] Lower funnel shifts for HVX HVX v62+ has bidirectional shifts, which do not mask the shift amount to the bit width. Instead, the shift amount is sign-extended from the log(BW) bit value, and a negative value causes a shift in the other direction. For the shift amount being -log(BW), this reversed shift will shift all bits out, inserting 0s or sign bits depending on the type and direction.	2022-10-14 14:13:18 -07:00

1 2 3 4 5 ...

69244 Commits