llvm-project

Commit Graph

Author	SHA1	Message	Date
Krzysztof Parzyszek	e5d9ab08c3	[Hexagon] Fix insertion point for pointer difference calculation HVC::calculatePointerDifference inserts temporary instructions for simplification, and calulation of known bits. These instructions were inserted at the end of a basic block (after the terminator), which caused BB->getTerminator() to return nullptr. This, in turn, caused a crash when a PHI instruction was examined in computeKnownBits.	2022-10-19 14:23:39 -07:00
Teresa Johnson	646e25d051	[BitcodeReader] Convert pair to triple in preparation for MemProf (NFC) Extracted from D135714 which adds summary support for MemProf. We will need a 3rd tuple member in the ValueIdToValueInfoMap, this patch makes a number of NFC changes to the existing clients of that map to reflect the conversion of pair to tuple.	2022-10-19 13:34:30 -07:00
Michal Paszkowski	6beac40fe4	[SPIR-V] Add get_image_num_mip_levels implementation Differential Revision: https://reviews.llvm.org/D135904	2022-10-19 22:29:16 +02:00
Michal Paszkowski	5fb4a05148	[SPIR-V] Add atomic_init and fix atomic explicit lowering Differential Revision: https://reviews.llvm.org/D135902	2022-10-19 22:13:29 +02:00
Alexey Bataev	b8b740c834	[SLP][NFC]Remove unused variable, NFC.	2022-10-19 12:35:27 -07:00
Fangrui Song	c80b12d352	Revert D135427 "[LTO] Make local linkage GlobalValue in non-prevailing COMDAT available_externally" This reverts commit `8ef3fd8d59`. I mentioned that GlobalAlias was not handled. It turns out GlobalAlias has to be handled in the same patch (as opposed to in a follow-up), as otherwise clang codegen of C5/D5 constructor/destructor would regress (https://reviews.llvm.org/D135427#3869003).	2022-10-19 11:24:12 -07:00
Yuanfang Chen	24c6ea917c	[JMCInstrument] rename ELF section name from ".just.my.code" to ".data.just.my.code" This gives linker scripts a hint about where to place the section.	2022-10-19 10:49:54 -07:00
Prabhdeep Singh Soni	6149589127	[OMPIRBuilder] Support depend clause for task This patch adds support for the `depend` clause for the `task` construct. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D135695	2022-10-19 13:11:43 -04:00
Chris Bieneman	607be386e7	[DX] Fix missing preserved analysis The ShaderFlagsAnalysisWrapper needs to be marked to preserve all analyssis. Fixes #58474 (https://github.com/llvm/llvm-project/issues/58474)	2022-10-19 12:11:03 -05:00
Sander de Smalen	36864d47d6	[AArch64] Fix minor issue introduced in D135950. The Key for the SubtargetMap had the StreamingSVEModeDisabled in the wrong place. This change is non-functional, since the string (key) is still unique.	2022-10-19 17:01:41 +00:00
Caroline Concatto	2ecbe8c38c	[AArch64] SME2 Single-multi vector ternary int/FP 2 and 4 registers This patch adds the assembly/disassembly for the following instructions: For INT: ADD(array results, multiple and single vector): Add replicated single vector to multi-vector with ZA array vector results. SUB(array results, multiple and single vector): Subtract replicated single vector from multi-vector with ZA array vector results. For FP: FMLA (multiple and single vector): Multi-vector floating-point fused multiply-add by vector. FMLS (multiple and single vector): Multi-vector floating-point multiply-subtract long by vector. The reference can be found here: https://developer.arm.com/documentation/ddi0602/2022-09 The Matriz Operand has 2 new sizes 32(.s) and 64(.d) bits (MatrixOp32 and MatrixOp64) Depends on: D135448 Depends on: D135952 Differential Revision: https://reviews.llvm.org/D135455	2022-10-19 17:49:48 +01:00
Sander de Smalen	137459aff6	[AArch64][SME] Disable (SLP\|Loop)Vectorizer when function may be executed in streaming mode. When the SME attributes tell that a function is or may be executed in Streaming SVE mode, we currently need to be conservative and disable _any_ vectorization (fixed or scalable) because the code-generator does not yet support generating streaming-compatible code. Scalable auto-vec will be gradually enabled in the future when we have confidence that the loop-vectorizer won't use any SVE or NEON instructions that are illegal in Streaming SVE mode. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D135950	2022-10-19 16:42:20 +00:00
Phoebe Wang	bc1819389f	[X86][RFC] Using `__bf16` for AVX512_BF16 intrinsics This is an alternative of D120395 and D120411. Previously we use `__bfloat16` as a typedef of `unsigned short`. The name may give user an impression it is a brand new type to represent BF16. So that they may use it in arithmetic operations and we don't have a good way to block it. To solve the problem, we introduced `__bf16` to X86 psABI and landed the support in Clang by D130964. Now we can solve the problem by switching intrinsics to the new type. Reviewed By: LuoYuanke, RKSimon Differential Revision: https://reviews.llvm.org/D132329	2022-10-19 23:47:04 +08:00
Jay Foad	f0ca946bf9	[AMDGPU] New helper function SIInsertWaitcnts::getVmemWaitEventType This just commons up and simplifies some logic that was repeated in SIInsertWaitcnts::updateEventWaitcntAfter. NFCI. Differential Revision: https://reviews.llvm.org/D136253	2022-10-19 16:22:50 +01:00
Joe Nash	ad6698562c	[AMDGPU] V_LDEXP_F16 encoding fix and doc update. The amdgcn.ldexp.* intrinsics take an i32 value as src1. The V_LDEXP_F16 instruction considers src1 an f16 operand, and therefore src1 is implicitly truncated to 16 bits when lowering to that instruction from the intrinsic. This is unlikely to result in an error in practice because values that large are not useful. The operand class of src1 in the True16 version of the instruction has been corrected to encode correctly on GFX11. Reviewed By: foad, rampitec Differential Revision: https://reviews.llvm.org/D136195	2022-10-19 09:52:53 -04:00
dbakunevich	fecfd01252	[Verifier] Allow undef/poison token argument to llvm.experimental.gc.result As part of the optimization in the unreachable code, we remove tokens, thereby replacing them with undef/poison in intrinsics. But the verifier falls on the assertion, within of what it sees token poison in unreachable code, which in turn is incorrect. bug: 57871, https://github.com/llvm/llvm-project/issues/57871 Differential Revision: https://reviews.llvm.org/D134427	2022-10-19 20:51:21 +07:00
Florian Hahn	d72fcee8f4	[VPlan] Add VPValue::isDefinedOutsideVectorRegions helper (NFC). @Ayal suggested a better named helper than using `!getDef()` to check if a value is invariant across all parts. The property we are using here is that the VPValue is defined outside any vector loop region. There's a TODO left to handle recipes defined in pre-header blocks. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D133666	2022-10-19 13:20:30 +01:00
Simon Pilgrim	9708d88017	Revert rG42230efccf8fe1185be5fa6c23dce0a8183d6ec9 "[DAG] Fold (sra (or (shl x, c1), (shl y, c2)), c1) -> (sext_inreg (or x, (shl y,c2-c1)) iff c2 >= c1" @foad was right - this isn't actually going to help with D136042 as much as hoped, we need a better AMDGPU-specific solution as other targets are likely to make use of it	2022-10-19 12:07:41 +01:00
Florian Hahn	1625224fbb	[SCEV] Replace assert with returning CouldNotComp in computeMaxBECountForLT. This patch removes the bail out for signed predicates and non-positive strides in howManyLessThans and updates computeMaxBECountForLT to return SCEVCouldNotCompute for signed predicates with negative strides. AFAICT bail-out was only added because computeMaxBECountForLT may not handle negative signed strides correctly. Instead of not calling computeMaxBECountForLT at all because we bail out earlier, we can instead return SCEVCouldNotCompute in computeMaxBECountForLT. The max backedge taken count will be computed as the max value of the symbolic backedge taken count. This improves precision in cases where we can compute symbolic backedge taken counts and also fixes a crash. Fixes #57818. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D135667	2022-10-19 11:24:10 +01:00
bipmis	38f3e44997	[AggressiveInstCombine] Load merge the reverse load pattern of consecutive loads. This patch extends the load merge/widen in AggressiveInstCombine() to handle reverse load patterns. Differential Revision: https://reviews.llvm.org/D135137	2022-10-19 11:22:58 +01:00
Serge Pavlov	0dec5e164f	Keep configuration file search directories in ExpansionContext. NFC Class ExpansionContext encapsulates options for search and expansion of response files, including configuration files. With this change the directories which are searched for configuration files are also stored in ExpansionContext. Differential Revision: https://reviews.llvm.org/D135439	2022-10-19 17:20:14 +07:00
Simon Pilgrim	42230efccf	[DAG] Fold (sra (or (shl x, c1), (shl y, c2)), c1) -> (sext_inreg (or x, (shl y,c2-c1)) iff c2 >= c1 Helps with some of the AMDGPU regressions identified in D136042 where we were losing signed BFE patterns after sinking shifts behind logic ops. Differential Revision: https://reviews.llvm.org/D136081	2022-10-19 11:18:49 +01:00
Jay Foad	ea09a426a9	[AMDGPU] Assume getDefIgnoringCopies will succeed. NFC. getDefIgnoringCopies and getSrcRegIgnoringCopies should not fail on valid MIR, so don't bother to check for failure. Differential Revision: https://reviews.llvm.org/D136238	2022-10-19 11:10:00 +01:00
Caroline Concatto	579ca5e7e1	[AArch64] Replace sme-i64 by sme-i16i64 and sme-f64 by sme-f64f64 The names in developer.arm for these SME features are: HaveSMEI16I64 and HaveSMEF64F64 so the new flag names are consistent with the documentation page Reviewed By: sdesmalen, c-rhodes Differential Revision: https://reviews.llvm.org/D135974	2022-10-19 10:56:46 +01:00
Juan Manuel MARTINEZ CAAMAÑO	bb24b2c610	[AMDGPU][Backend] Fix user-after-free in AMDGPUReleaseVGPRs::isLastVGPRUseVMEMStore Reviewed By: jpages, arsenm Differential Revision: https://reviews.llvm.org/D134641	2022-10-19 04:38:16 -05:00
Nikita Popov	747f27d97d	[AA] Rename getModRefBehavior() to getMemoryEffects() (NFC) Follow up on D135962, renaming the method name to match the new type name.	2022-10-19 11:03:54 +02:00
Nikita Popov	1a9d9823c5	[AA] Rename uses of FunctionModRefBehavior (NFC) Followup to D135962 to rename remaining uses of FunctionModRefBehavior to MemoryEffects. Does not touch API names yet, but also updates variables names FMRB/MRB to ME, to match the new type name.	2022-10-19 10:54:47 +02:00
luxufan	82c820b95c	[RISCV] Enable the LocalStackSlotAllocation pass support For RISC-V, load/store(exclude vector load/store) instructions only has a 12 bit immediate operand. If the offset is out-of-range, it must make use of a temp register to make up this offset. If between these offsets, they have a small(IsInt<12>) relative offset, LocalStackSlotAllocation pass can find a value as frame base register's value, and replace the origin offset with this register's value plus the relative offset. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D98101	2022-10-19 16:15:14 +08:00
Freddy Ye	3ee58e2f35	[X86] Add WRMSRNS instructions. For more details about these instructions, please refer to the latest ISE document: https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D135935	2022-10-19 13:04:11 +08:00
Craig Topper	7a4e56acac	[RISCV] Add an early out to lowerVECTOR_SHUFFLEAsVSlidedown. NFC If Mask[0] is 0, then we're never going to match a slidedown. If we get through the for loop, then it's an identity mask which should have already been optimized out. Otherwise it's some non-contiguous mask that will fail out of the lop. Might as well not bother entering the loop.	2022-10-18 21:35:15 -07:00
Freddy Ye	e3df4ba9d2	[X86] Add MSRLIST instructions. For more details about these instructions, please refer to the latest ISE document: https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html Reviewed By: skan, RKSimon Differential Revision: https://reviews.llvm.org/D135934	2022-10-19 10:35:42 +08:00
chenglin.bi	b18293edc3	[MC][COFF] Add COFF section flag "Info" For now, we have not parse section flag `Info` in asm file. When we emit a section with info flag to asm, then compile asm to obj we will lose the Info flag for the section. The motivation of this change is ARM64EC's hybmp$x section. If we lose the Info flag MSVC link will report a warning: `warning LNK4078: multiple '.hybmp' sections found with different attributes` Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D136125	2022-10-19 10:32:58 +08:00
Weining Lu	771aee91c8	Reland "[LoongArch] Fix codegen of atomicrmw nand" Fix invalid RISCV-like MI being emitted for performing the `not` operation: the LoongArch `xori` zero-extends the immediate, hence is not equivalent to RISCV `xori`. The LoongArch `not` is a `nor` with zero. Patch by lrzlin (Lin Runze). Differential Revision: https://reviews.llvm.org/D136021	2022-10-19 10:05:35 +08:00
Chen Zheng	df9d60af1f	[PowerPC] handle more than two predecessors loop header in ctrloop pass After ISEL, the "valid" loop header which has two predecessors (one is preheader and the other one is latch) may be transformed to have more than two predecessors by some optimizations, like tail duplicator, if the old header's successor(will be changed to new header) is a sub loop. The predecessors of the new loop header are preheader, loop latch and the loop latch(es) of the sub loop(old header's successor). Before the patch, ctrloop pass assumes two predecessors for candidate loop header. This patch fixes this case. Reviewed By: lkail Differential Revision: https://reviews.llvm.org/D135846	2022-10-19 01:11:58 +00:00
Weining Lu	0f374ca5cd	Revert "[LoongArch] Fix codegen of atomicrmw nand" This reverts commit `9572406bbc`. The author name is wrong.	2022-10-19 07:56:23 +08:00
Alexey Bataev	087dadfd37	[SLP]Generalize cost model. Generalized the cost model estimation. Improved cost model estimation for repeated scalars (no need to count their cost anymore), improved cost model for extractelement instructions. cpu2017 511.povray_r 0.57 520.omnetpp_r -0.98 521.wrf_r -0.01 525.x264_r 3.59 <+ 526.blender_r -0.12 531.deepsjeng_r -0.07 538.imagick_r -1.42 Geometric mean: 0.21 Differential Revision: https://reviews.llvm.org/D115757	2022-10-18 11:55:59 -07:00
Eli Friedman	d6481dc88c	[AArch64][Windows] Add MC support for save_any_reg. Representing this as 12 separate operations is a bit ugly, but trying to represent the different modes using a bitfield seemed worse. Differential Revision: https://reviews.llvm.org/D135417	2022-10-18 11:45:27 -07:00
Alexey Bataev	62267e8de0	Revert "[SLP]Generalize cost model." This reverts commit `f12fb91188` and `f5c747bfbe` to fix detected non-initialized var use.	2022-10-18 11:25:59 -07:00
Sjoerd Meijer	f7c42a278b	Revert "Recommit "[LoopFlatten] Enable it by default"" This reverts commit `5b9597f59a`. A miscompilation was reported: https://github.com/llvm/llvm-project/issues/58441 Reverting this while I look at that.	2022-10-18 23:36:36 +05:30
Alexey Bataev	f5c747bfbe	[SLP][NFC]Fix a warning for ?: with enum/unsigned, NFC.	2022-10-18 10:08:05 -07:00
Krzysztof Parzyszek	6a8cfe9a72	[Hexagon] Use shifts by scalar for funnel shifts by scalar HVX has vector shifts by a scalar register. Use those in the expansions of funnel shifts where profitable.	2022-10-18 09:49:17 -07:00
Chris Bieneman	6e05c8dfc8	[DX] Create globals for DXContainer parts DXContainer files have a handful of sections that need to be written. This adds a pass to write the section data into IR globals, and writes the shader flag data into a global. The test cases here verify that the shader flags are correctly written from the IR into the global and emitted to the DXContainer. This change also fixes a bug in the MCDXContainerWriter, where the size of the dxbc::ProgramHeader was not being included in the part offset calcuations. This is verified to be working by the new testcases where obj2yaml can properly dump part data for parts after the DXIL part. Resolves issue #57742 (https://github.com/llvm/llvm-project/issues/57742) Reviewed By: python3kgae Differential Revision: https://reviews.llvm.org/D135793	2022-10-18 11:48:08 -05:00
Florian Hahn	c65513444b	[IndVars] Forget SCEV for instruction and users before replacing it. Extra invalidation is needed here to clear stale values to fix a verification failure. Fixes #58440.	2022-10-18 17:38:14 +01:00
Mingming Liu	34d18fd241	[AArch64] Enhance bit-field-positioning op matcher to see through 'any_extend' for pattern 'and(any_extend(shl(val, N)), shifted-mask)' Before this patch (and refactor patch D135843), isBitfieldPositioningOp won't handle "and(any_extend(shl(val, N), shifted-mask)" (bail out if AND op is not SHL) After this patch, isBitfieldPositioningOp will see through "any_extend" to find "shl" to find possible bit-field-positioning nodes. https://gcc.godbolt.org/z/3ncGKbGW6 is a four-liner LLVM IR that could be optimized to UBFIZ (see added test case test_and_extended_shift_with_imm in llvm/test/CodeGen/AArch64/bitfield-insert.ll). One existing test case also improves. Differential Revision: https://reviews.llvm.org/D135852	2022-10-18 09:07:14 -07:00
Han-Kuan Chen	615af94dc2	[RISCV] Lower VECTOR_SHUFFLE to VSLIDEDOWN_VL. Differential Revision: https://reviews.llvm.org/D136136	2022-10-18 08:58:39 -07:00
Anton Sidorenko	1978b4d968	[MachineCombiner][RISCV] Enable MachineCombiner for RISCV Initial implementation to match basic FP reassociation patterns. Differential Revision: https://reviews.llvm.org/D135264	2022-10-18 18:56:32 +03:00
Alexey Bataev	f12fb91188	[SLP]Generalize cost model. Generalized the cost model estimation. Improved cost model estimation for repeated scalars (no need to count their cost anymore), improved cost model for extractelement instructions. cpu2017 511.povray_r 0.57 520.omnetpp_r -0.98 521.wrf_r -0.01 525.x264_r 3.59 <+ 526.blender_r -0.12 531.deepsjeng_r -0.07 538.imagick_r -1.42 Geometric mean: 0.21 Differential Revision: https://reviews.llvm.org/D115757	2022-10-18 08:49:32 -07:00
Arthur Eubanks	743087fb63	Port print-cfg-sccs to new pass manager This is actually used, see https://discourse.llvm.org/t/use-print-callgrapg-sccs-from-opt/65782. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D135718	2022-10-18 08:47:08 -07:00
Arthur Eubanks	6219ec07c6	[SROA] Don't speculate phis with different load user types Fixes an SROA crash. Fallout from opaque pointers since with typed pointers we'd bail out at the bitcast. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D136119	2022-10-18 08:44:13 -07:00
Sanjay Patel	44b7da89d7	[InstCombine] fmul nnan X, 0.0 --> copysign(0.0, X) https://alive2.llvm.org/ce/z/ybgM5F Differential Revision: https://reviews.llvm.org/D136166	2022-10-18 11:34:02 -04:00

1 2 3 4 5 ...

162738 Commits