llvm-project

Commit Graph

Author	SHA1	Message	Date
Min-Yih Hsu	fc86e6d188	[ARM][disassembler] Fix incorrect number of MCOperands generated by the disassembler Try to fix bug 49974. This patch fixes two issues: 1. BL does not use predicate (BL_pred is the predicate version of BL), so we shouldn't add predicate operands in DecodeBranchImmInstruction. 2. Inside DecodeT2AddSubSPImm, we shouldn't add predicate operands into the MCInst because ARMDisassembler::AddThumbPredicate will do that for us. However, we should handle CC-out operand for t2SUBspImm and t2AddspImm. Differential Revision: https://reviews.llvm.org/D100585	2021-04-25 11:55:10 -07:00
Simon Pilgrim	535df472b0	Revert rG2149aa73f640c96 "[X86] Add support for reusing ZF etc. from locked XADD instructions (PR20841)" This might be the cause of some msan build failures - I don't have access to a msan build right now, so this is a speculative revert.	2021-04-25 12:45:07 +01:00
Simon Pilgrim	2149aa73f6	[X86] Add support for reusing ZF etc. from locked XADD instructions (PR20841) XADD has the same EFLAGS behaviour as ADD	2021-04-25 12:02:33 +01:00
Xiang1 Zhang	c3f95e9197	[X86] Refine AMX fast register allocation	2021-04-25 14:20:53 +08:00
Xiang1 Zhang	3b8ec86fd5	[X86] Support AMX fast register allocation Differential Revision: https://reviews.llvm.org/D100026	2021-04-25 09:45:41 +08:00
RamNalamothu	0ce723cb22	[NFC] Refactor how CFI section types are represented in AsmPrinter In terms of readability, the `enum CFIMoveType` didn't better document what it intends to convey i.e. the type of CFI section that gets emitted. Reviewed By: dblaikie, MaskRay Differential Revision: https://reviews.llvm.org/D76519	2021-04-24 23:29:42 +05:30
David Green	af342f7240	[AArch64] Enable UseAA globally in the AArch64 backend This is similar to D69796 from the ARM backend. We remove the UseAA feature, enabling it globally in the AArch64 backend. This should in general be an improvement allowing the backend to reorder more instructions in scheduling and codegen, and enabling it by default helps to improve the testing of the feature, not making it cpu-specific. A debugging option is added instead for testing. Differential Revision: https://reviews.llvm.org/D98781	2021-04-24 17:51:50 +01:00
David Green	7255d1f54f	[ARM] Format ARMISD node definitions. NFC This clang-formats the list of ARMISD nodes. Usually this is something I would avoid, but these cause problems with formatting every time new nodes are added. The list in getTargetNodeName also makes use of MAKE_CASE macros, as other backends do.	2021-04-24 14:50:32 +01:00
Craig Topper	bd28d86119	[RISCV] Removed getLMULForFixedLengthVector. Use getContainerForFixedLengthVector and getRegClassIDForVecVT to get the register class to use when making a fixed vector type legal. Inline it into the other two call sites. I'm looking into using fractional lmul for fixed length vectors and getLMULForFixedLengthVector returned an integer making it unable to express this. I considered returning the LMUL enum, but that seemed like it would introduce more complexity to convert it for use.	2021-04-23 16:56:46 -07:00
Craig Topper	bcf321015b	[RISCV] Move getLMULForFixedLengthVector out of RISCVSubtarget. Make it a static function RISCVISelLowering, the only place it is used. I think I'm going to make this return a fractional LMULs in some cases so I'm sorting out where it should live before I start making changes.	2021-04-23 15:06:20 -07:00
Craig Topper	baa107f018	[RISCV] Only expose one interface for getContainerForFixedLengthVector in the RISCVTargetLowering class We can have RISCVISelDAGToDAG.cpp call the VT only version by finding the RISCVTargetLowering object via the Subtarget. Make the static versions just global static functions in RISCVISelLowering that can be called by static functions in that file.	2021-04-23 15:06:10 -07:00
Mitch Phillips	caea37b37e	Revert "[X86][AMX] Try to hoist AMX shapes' def" This reverts commit `90118563ad`. Reason: Broke the MSan buildbots. https://lab.llvm.org/buildbot/#/builders/5/builds/6967/steps/9/logs/stdio More details can be found in the original phabricator review: https://reviews.llvm.org/D101067	2021-04-23 10:42:26 -07:00
Craig Topper	3064a63b2b	[RISCV] Remove GetVRegNoV0 from the output register class of masked compare pseudo instructions. Theses instructions are allowed to write v0 when they are masked. We'll still never use v0 because of the earlyclobber constraint so this doesn't really help anything. It just makes the definitions correct. While I was there remove an unused multiclass I noticed. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D101118	2021-04-23 09:33:29 -07:00
Craig Topper	fae1d31c09	[RISCV] Have assembler check that the temp register is different than dest register for vmsgeu.vx pseudo. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D101015	2021-04-23 09:33:29 -07:00
Sebastian Neubauer	3366d81153	[AMDGPU] Save WWM registers in functions The values of registers in inactive lanes needs to be saved during function calls. Save all registers used for whole wave mode, similar to how it is done for VGPRs that are used for SGPR spilling. Differential Revision: https://reviews.llvm.org/D99429 Reapply with fixed tests on window.	2021-04-23 18:09:24 +02:00
Nemanja Ivanovic	6725b90a02	[PowerPC] Add vec_ctsl and vec_ctul to altivec.h These are added for compatibility with XLC. They are similar to vec_cts and vec_ctu except that the result is a doubleword vector regardless of the parameter type.	2021-04-23 11:03:38 -05:00
Simon Pilgrim	043bc88dba	[CostModel][X86] Improve v2f32 fadd reduction cost This was being reported as a similar cost to v4f32 when its a lot cheaper (just a shufps+addps).	2021-04-23 16:56:13 +01:00
Sander de Smalen	f9a50f04ba	[TTI] NFC: Change getIntImmCost[Inst\|Intrin] to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential Revision: https://reviews.llvm.org/D100565	2021-04-23 16:06:36 +01:00
Sander de Smalen	43ace8b5ce	[TTI] NFC: Change getScalingFactorCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential Revision: https://reviews.llvm.org/D100564	2021-04-23 16:06:36 +01:00
Sander de Smalen	008a072ded	[TTI] NFC: Change getMemcpyCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential Revision: https://reviews.llvm.org/D100563	2021-04-23 16:06:35 +01:00
Sander de Smalen	e0edfa052f	[TTI] NFC: Change getAddressComputationCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential Revision: https://reviews.llvm.org/D100561	2021-04-23 16:06:35 +01:00
dfukalov	9ab17a60eb	[TTI] NFC: Use InstructionCost to store ScalarizationCost in IntrinsicCostAttributes. This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D101151	2021-04-23 18:02:00 +03:00
Joe Ellis	c19c0ad681	[AArch64][SVE] Fix bug in lowering of fixed-length integer vector divides The function AArch64TargetLowering::LowerFixedLengthVectorIntDivideToSVE previously assumed the operands were full vectors, but this is not always true. This function would produce bogus if the division operands are not full vectors, resulting in miscompiles when dividing 8-bit or 16-bit vectors. The fix is to perform an extend + div + truncate for non-full vectors, instead of the usual unpacking and unzipping logic. This is an additive change which reduces the non-full integer vector divisions to a pattern recognised by the existing lowering logic. For future reference, an example of code that would miscompile before this patch is below: 1 int8_t foo(unsigned N, int8_t a, int8_t b, int8_t *c) { 2 int8_t result = 0; 3 for (int i = 0; i < N; ++i) { 4 result += (a[i] / b[i]) / c[i]; 5 } 6 return result; 7 } Differential Revision: https://reviews.llvm.org/D100370	2021-04-23 14:55:10 +00:00
Sebastian Neubauer	22d99cb63f	Revert "[AMDGPU] Save WWM registers in functions" This reverts commit `91464c30bf`. Seems to break tests on windows.	2021-04-23 16:38:50 +02:00
Krzysztof Parzyszek	8ebdb58aac	[Hexagon] Remove redundant HVX intrinsic selection patterns, NFC Deleted HexagonMapAsm2IntrinV65.gen.td that wasn't included anywhere, moved V6_vrmpy_rtt patterns to HexagonIntrinsics.td. Touch CMakeLists.txt to force re-cmake (somehow the unused file was listed as a dependency in the generated makefiles).	2021-04-23 09:28:08 -05:00
Sebastian Neubauer	91464c30bf	[AMDGPU] Save WWM registers in functions The values of registers in inactive lanes needs to be saved during function calls. Save all registers used for whole wave mode, similar to how it is done for VGPRs that are used for SGPR spilling. Differential Revision: https://reviews.llvm.org/D99429	2021-04-23 16:09:31 +02:00
Simon Pilgrim	7b32e8b96a	[X86] combineSetCCAtomicArith - pull out repeated ops. NFCI. Reduces diff in D101074	2021-04-23 14:19:24 +01:00
Matt Arsenault	b58332774f	AMDGPU: Fix assert on inline asm on gfx90a This was assuming all mayLoad instructions have one def.	2021-04-23 09:00:25 -04:00
Fraser Cormack	83b8f8da82	[RISCV] Custom lower vector F(MIN\|MAX)NUM to vf(min\|max) This patch adds support for both scalable- and fixed-length vector code lowering of the llvm.minnum and llvm.maxnum intrinsics to the equivalent RVV instructions. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D101035	2021-04-23 12:22:15 +01:00
Wang, Pengfei	151e244fe6	[X86][AMX][NFC] Make comparison operators to be complete The previous D101039 didn't fix the SmallSet insertion issue, due to we always return false for the comparison between 2 different nonnull BBs. This patch makes the the comparison to be complete by comparing `MBB` first, so that we can always get the invariant order by a single operator.	2021-04-23 17:38:54 +08:00
Daniel Kiss	b1f463dcae	[AArch64] Fix for BTI landing pad insertion with PAC-RET+bkey. EMITBKEY is emitted for PAC-RET+bkey, which is a non machine instructions. PR: 49957 Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D100996	2021-04-23 10:07:25 +02:00
Wang, Pengfei	53673fd1bf	[X86][AMX][NFC] Avoid assert for the same immidiate value The previous condition in the assert was over strict. We ought to allow the same immidiate value being loaded more than once. The intention for the assert is to check the same AMX register uses multiple different immidiate shapes. So this fix supposes to be NFC. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D101124	2021-04-23 12:17:00 +08:00
Wang, Pengfei	90118563ad	[X86][AMX] Try to hoist AMX shapes' def We request no intersections between AMX instructions and their shapes' def when we insert ldtilecfg. However, this is not always ture resulting from not only users don't follow AMX API model, but also optimizations. This patch adds a mechanism that tries to hoist AMX shapes' def as well. It only hoists shapes inside a BB, we can improve it for cases across BBs in future. Currently, it only hoists shapes of which all sources' def above the first AMX instruction. We can improve for the case that only source that moves an immediate value to a register below AMX instruction. Differential Revision: https://reviews.llvm.org/D101067	2021-04-23 12:17:00 +08:00
Wang, Pengfei	e8bce83996	[X86] Enable compilation of user interrupt handlers. Add __uintr_frame structure and use UIRET instruction for functions with x86 interrupt calling convention when UINTR is present. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D99708	2021-04-23 11:43:57 +08:00
Matt Arsenault	ed633a1daa	AMDGPU: Restore atomic fp feature on FP atomic instruction definitions `9931b1f7a4` switched this to checking for the two specific subtargets, instead of the dedicated feature. This broke supporting functions which force added the feature when emitting targets that do not actually support them. This stil does not work for the targets that use the gfx6/7 or gfx10 encodings.	2021-04-22 21:32:01 -04:00
Levy Hsu	b49337bbb9	[RISCV] [1/2] Add IR intrinsic for Zbp extension RV32/64: grev grevi gorc gorci shfl shfli unshfl unshfli RV64 ONLY: grevw greviw gorcw gorciw shflw shfli (For non-existing shfliw) unshfli (For non-existing unshfliw) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100830	2021-04-22 16:34:51 -07:00
Heejin Ahn	c390621aeb	[WebAssembly] Fix fixEndsAtEndOfFunction for delegate Background: CFGStackify's [[ `398f253400/llvm/lib/Target/WebAssembly/WebAssemblyCFGStackify.cpp (L1481-L1540)` \| fixEndsAtEndOfFunction ]] fixes block/loop/try's return type when the end of function is unreachable and the function return type is not void. So if a function returns i32 and `block`-`end` wraps the whole function, i.e., the `block`'s `end` is the last instruction of the function, the `block`'s return type should be i32 too: ``` block i32 ... end end_function ``` If there are consecutive `end`s, this signature has to be propagate to those blocks too, like: ``` block i32 ... block i32 ... end end end_function ``` This applies to `try`-`end` too: ``` try i32 ... catch ... end end_function ``` In case of `try`, we not only follow consecutive `end`s but also follow `catch`, because for the type of the whole `try` to be i32, both `try` and `catch` parts have to be i32: ``` try i32 ... block i32 ... end catch ... block i32 ... end end end_function ``` --- Previously we only handled consecutive `end`s or `end` before a `catch`. But now we have `delegate`, which serves like `end` for `try`-`delegate`. So we have to follow `delegate` too and mark its corresponding `try` as i32 (the function's return type): ``` try i32 ... catch ... try i32 ;; Here ... delegate N end end_function ``` Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D101036	2021-04-22 15:32:00 -07:00
Heejin Ahn	b3e88ccba7	[WebAssembly] Serialize params/results in MachineFunctionInfo This adds support for YAML serialization of `Params` and `Results` fields in `WebAssemblyMachineFunctionInfo`. Types are printed as `MVT`'s string representation. This is for writing MIR tests easier. The tests added are testing simple parsing and printing of `params` / `results` fields under `machineFunctionInfo`. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D101029	2021-04-22 15:31:09 -07:00
Heejin Ahn	0b2bc69ba2	[WebAssembly] Put utility functions in Utils directory (NFC) This CL 1. Creates Utils/ directory under lib/Target/WebAssembly 2. Moves existing WebAssemblyUtilities.cpp\|h into the Utils/ directory 3. Creates Utils/WebAssemblyTypeUtilities.cpp\|h and put type declarataions and type conversion functions scattered in various places into this single place. It has been suggested several times that it is not easy to share utility functions between subdirectories (AsmParser, DIsassembler, MCTargetDesc, ...). Sometimes we ended up [[ https://reviews.llvm.org/D92840#2478863 \| duplicating ]] the same function because of this. There are already other targets doing this: AArch64, AMDGPU, and ARM have Utils/ subdirectory under their target directory. This extracts the utility functions into a single directory Utils/ and make them sharable among all passes in WebAssembly/ and its subdirectories. Also I believe gathering all type-related conversion functionalities into a single place makes it more usable. (Actually I was working on another CL that uses various type conversion functions scattered in multiple places, which became the motivation for this CL.) Reviewed By: dschuff, aardappel Differential Revision: https://reviews.llvm.org/D100995	2021-04-22 15:29:43 -07:00
Krzysztof Parzyszek	06234f758e	[Hexagon] Improve lowering of returns of i1 Emit explicit any-extend to avoid weird tstbit sequences.	2021-04-22 16:47:52 -05:00
Krzysztof Parzyszek	ab9521aaeb	[Hexagon] Use 'vnot' instead of 'not' in patterns with vectors 'not' expands to checking for an xor with a -1 constant. Since this looks for a ConstantSDNode it will never match for a vector. Co-authored-by: Craig Topper <craig.topper@sifive.com> Differential Revision: https://reviews.llvm.org/D100687	2021-04-22 15:36:20 -05:00
David Green	c0bf5929ee	[AArch64] Improve vector reverse lowering This improves the lowering of v8i16 and v16i8 vector reverse shuffles. Instead of going via a generic tbl it uses a rev64; ext pair, as already happens for v4i32. Differential Revision: https://reviews.llvm.org/D100882	2021-04-22 21:01:25 +01:00
Min-Yih Hsu	6f4ed8c0bd	[M68k][Disassembler][NFC] Decorate dump methods with LLVM_DUMP_METHOD And guard them with proper macro conditions. NFC.	2021-04-22 12:02:07 -07:00
Min-Yih Hsu	2ab6fa3dcd	[M68k][AsmParser][NFC] Remove redundant default cases Remove redundant default cases since all enumeration values have been covered (-Wcovered-switch-default). NFC.	2021-04-22 11:50:48 -07:00
Craig Topper	e01c419ecd	[RISCV] Add IR intrinsics for vmsge(u).vv/vx/vi. These instructions don't really exist, but we have ways we can emulate them. .vv will swap operands and use vmsle().vv. .vi will adjust the immediate and use .vmsgt(u).vi when possible. For .vx we need to use some of the multiple instruction sequences from the V extension spec. For unmasked vmsge(u).vx we use: vmslt{u}.vx vd, va, x; vmnand.mm vd, vd, vd For cases where mask and maskedoff are the same value then we have vmsge{u}.vx v0, va, x, v0.t which is the vd==v0 case that requires a temporary so we use: vmslt{u}.vx vt, va, x; vmandnot.mm vd, vd, vt For other masked cases we use this sequence: vmslt{u}.vx vd, va, x, v0.t; vmxor.mm vd, vd, v0 We trust that register allocation will prevent vd in vmslt{u}.vx from being v0 since v0 is still needed by the vmxor. Differential Revision: https://reviews.llvm.org/D100925	2021-04-22 10:44:38 -07:00
Craig Topper	d77d56acfd	[RISCV] Add missing tests for vector type for second operand of vmsgt and vmsgtu IR intrinsics. Refactor to use new multiclass instead of individual patterns. We already supported this due to SEW=64 on RV32, but we didn't have test cases for all the types we supported. Part of D100925	2021-04-22 10:44:38 -07:00
Craig Topper	9524a0553d	[RISCV] Support vector type for second operand of vmfge and vmfgt IR intrinsics. We don't have instructions for these, but can swap the operands to use vmle/vmflt. This makes the IR interface more consistent and simplifies the frontend implementation. Part of D100925	2021-04-22 10:44:38 -07:00
Craig Topper	70254ccb69	[RISCV] Turn splat shuffles of vector loads into strided load with stride of x0. Implementations are allowed to optimize an x0 stride to perform less memory accesses. This is the case in SiFive cores. No idea if this is the case in other implementations. We might need a tuning flag for this. Reviewed By: frasercrmck, arcbbb Differential Revision: https://reviews.llvm.org/D100815	2021-04-22 10:02:57 -07:00
Craig Topper	77f14c96e5	[RISCV] Use stack temporary to splat two GPRs into SEW=64 vector on RV32. Rather than doing splatting each separately and doing bit manipulation to merge them in the vector domain, copy the data to the stack and splat it using a strided load with x0 stride. At least on some implementations this vector load is optimized to not do a load for each element. This is equivalent to how we move i64 to f64 on RV32. I've only implemented this for the intrinsic fallbacks in this patch. I think we do similar splatting/shifting/oring in other places. If this is approved, I'll refactor the others to share the code. Differential Revision: https://reviews.llvm.org/D101002	2021-04-22 09:50:07 -07:00
Krzysztof Parzyszek	deda60fcaf	[Hexagon] Add HVX intrinsics for conditional vector loads/stores Intrinsics for the following instructions are added. The intrinsic name is "int_hexagon_<inst>[_128B]", e.g. int_hexagon_V6_vL32b_pred_ai for 64-byte version int_hexagon_V6_vL32b_pred_ai_128B for 128-byte version V6_vL32b_pred_ai if (Pv4) Vd32 = vmem(Rt32+#s4) V6_vL32b_pred_pi if (Pv4) Vd32 = vmem(Rx32++#s3) V6_vL32b_pred_ppu if (Pv4) Vd32 = vmem(Rx32++Mu2) V6_vL32b_npred_ai if (!Pv4) Vd32 = vmem(Rt32+#s4) V6_vL32b_npred_pi if (!Pv4) Vd32 = vmem(Rx32++#s3) V6_vL32b_npred_ppu if (!Pv4) Vd32 = vmem(Rx32++Mu2) V6_vL32b_nt_pred_ai if (Pv4) Vd32 = vmem(Rt32+#s4):nt V6_vL32b_nt_pred_pi if (Pv4) Vd32 = vmem(Rx32++#s3):nt V6_vL32b_nt_pred_ppu if (Pv4) Vd32 = vmem(Rx32++Mu2):nt V6_vL32b_nt_npred_ai if (!Pv4) Vd32 = vmem(Rt32+#s4):nt V6_vL32b_nt_npred_pi if (!Pv4) Vd32 = vmem(Rx32++#s3):nt V6_vL32b_nt_npred_ppu if (!Pv4) Vd32 = vmem(Rx32++Mu2):nt V6_vS32b_pred_ai if (Pv4) vmem(Rt32+#s4) = Vs32 V6_vS32b_pred_pi if (Pv4) vmem(Rx32++#s3) = Vs32 V6_vS32b_pred_ppu if (Pv4) vmem(Rx32++Mu2) = Vs32 V6_vS32b_npred_ai if (!Pv4) vmem(Rt32+#s4) = Vs32 V6_vS32b_npred_pi if (!Pv4) vmem(Rx32++#s3) = Vs32 V6_vS32b_npred_ppu if (!Pv4) vmem(Rx32++Mu2) = Vs32 V6_vS32Ub_pred_ai if (Pv4) vmemu(Rt32+#s4) = Vs32 V6_vS32Ub_pred_pi if (Pv4) vmemu(Rx32++#s3) = Vs32 V6_vS32Ub_pred_ppu if (Pv4) vmemu(Rx32++Mu2) = Vs32 V6_vS32Ub_npred_ai if (!Pv4) vmemu(Rt32+#s4) = Vs32 V6_vS32Ub_npred_pi if (!Pv4) vmemu(Rx32++#s3) = Vs32 V6_vS32Ub_npred_ppu if (!Pv4) vmemu(Rx32++Mu2) = Vs32 V6_vS32b_nt_pred_ai if (Pv4) vmem(Rt32+#s4):nt = Vs32 V6_vS32b_nt_pred_pi if (Pv4) vmem(Rx32++#s3):nt = Vs32 V6_vS32b_nt_pred_ppu if (Pv4) vmem(Rx32++Mu2):nt = Vs32 V6_vS32b_nt_npred_ai if (!Pv4) vmem(Rt32+#s4):nt = Vs32 V6_vS32b_nt_npred_pi if (!Pv4) vmem(Rx32++#s3):nt = Vs32 V6_vS32b_nt_npred_ppu if (!Pv4) vmem(Rx32++Mu2):nt = Vs32	2021-04-22 11:49:29 -05:00
Coplin, Jared	f3451162e8	[Hexagon] Unmasked and masked load pair to dame bae -? one load and selects	2021-04-22 10:15:46 -05:00
Joe Ellis	528ee161c9	[AArch64] Block tryCombineToBSL combines for vectors wider than NEON There are no patterns for the AArch64ISD::BSP ISD node for anything other than NEON vectors at the moment. As a result, if we hit these combines for vectors wider than a NEON vector (such as what we might get with fixed length SVE) we will fail to lower. This patch simply prevents us from attempting the combines if the input vector type is too wide. Reviewed By: peterwaller-arm Differential Revision: https://reviews.llvm.org/D100961	2021-04-22 15:09:13 +00:00
Krzysztof Parzyszek	e8d0475472	Revert "[Hexagon] Masked and unmasked load to same base -> load and two selects" This reverts commit `96dc8d7e7d`. It breaks a few builds.	2021-04-22 09:06:18 -05:00
Tim Northover	2e72f6b5d8	AArch64: support mixed-size fp <-> int conversions in GlobalISel.	2021-04-22 15:03:17 +01:00
Tim Northover	ac1647cc80	AArch64: expand G_DIVREM operations in GlobalISel We don't have a specific instruction for these, so they should be expanded to whatever separate division & multiplication is needed.	2021-04-22 15:03:17 +01:00
Coplin, Jared	96dc8d7e7d	[Hexagon] Masked and unmasked load to same base -> load and two selects	2021-04-22 08:44:01 -05:00
Wang, Pengfei	aafb6d81cf	[X86][AMX][NFC] Remove assert for comparison between different BBs. SmallSet may use operator `<` when we insert MIRef elements, so we cannot limit the comparison between different BBs. We allow MIRef() to be less that any initialized MIRef object, otherwise, we always reture false when compare between different BBs. Differential Revision: https://reviews.llvm.org/D101039	2021-04-22 20:41:59 +08:00
Jay Foad	82d34fe2b3	Fix typo "beneficiates" in comments	2021-04-22 12:30:16 +01:00
Simon Pilgrim	439366817b	MipsSEFrameLowering.h - remove unused headers. NFCI.	2021-04-22 11:32:29 +01:00
Nemanja Ivanovic	092619cf6b	[PowerPC] Improve codegen for vector fp to int widening conversions We currently do not utilize instructions that convert single precision vectors to doubleword integer vectors. These conversions come up in code occasionally and this improvement allows us to open code some functions that need to be added to altivec.h.	2021-04-22 05:04:06 -05:00
Martin Storsjö	8000e1f578	[AArch64] Fix calling windows varargs with floats in fixed args from non-windows functions When inspecting the calling convention, for calling windows functions from a non-windows function, inspect the calling convention of the called function, not the caller. Also remove an unnecessary parameter to AArch64CallLowering OutgoingArgHandler. Differential Revision: https://reviews.llvm.org/D100890	2021-04-22 12:02:49 +03:00
Jay Foad	79cb3ba08f	[AMDGPU] SIWholeQuadMode: don't add duplicate implicit $exec operands STRICT_WWM and STRICT_WQM are already defined with Uses = [EXEC], so there is no need to add another implicit use of $exec when lowering them to V_MOV_B32 instructions. Differential Revision: https://reviews.llvm.org/D100969	2021-04-22 09:19:47 +01:00
Serge Pavlov	740962e5d0	[RISCV] Custom lowering of SET_ROUNDING Differential Revision: https://reviews.llvm.org/D91242	2021-04-22 15:04:55 +07:00
Craig Topper	58c5b4c2c3	[RISCV] Use TargetConstant for condition code of RISCVISD::SELECT_CC. The value is always an immediate and can never be in a register. This the kind of thing TargetConstant is for. Saves a step GenDAGISel to convert a Constant to a TargetConstant.	2021-04-21 23:08:52 -07:00
Serge Pavlov	6e63dfdae2	[RISCV] Custom lowering of FLT_ROUNDS_ Differential Revision: https://reviews.llvm.org/D90854	2021-04-22 11:39:15 +07:00
Craig Topper	f6d8cf7798	[RISCV] Teach lowerSPLAT_VECTOR_PARTS to detect cases where Hi is sign extended from Lo. This recognizes the case when Hi is (sra Lo, 31). We can use SPLAT_VECTOR_I64 rather than splatting the high bits and combining them in the vector register.	2021-04-21 20:24:23 -07:00
Matt Arsenault	987e52851e	AMDGPU: Fix assert when trying to fold reg_sequence of physreg copies	2021-04-21 21:58:18 -04:00
Jessica Paquette	3011aa1aea	[AArch64][GlobalISel] Fix regbankselect for G_FCMP with vector destinations These should always go to a FPR, since they always use the vector registers. Differential Revision: https://reviews.llvm.org/D100885	2021-04-21 18:11:30 -07:00
Jessica Paquette	6cb7599078	[AArch64][GlobalISel] Mark some vector G_ABS cases as legal Each of the cases marked as legal here have an imported pattern in AArch64GenGlobalISel.inc. So, if we mark them as legal, we get selection for free. Technically this is only supposed to happen if we have NEON support. But, we fall back if we don't have that in the legalizer right now. I suppose it'd be better to have a FIXME so we can write the testcase when the time comes. (Plus, it'd just fall back in selection if NEON isn't available, so it's not wrong, I guess?) This fixes some fallbacks in the test suite. (Also use `isScalar` from LegalityPredicates.cpp while we're here just to tidy things a little bit.) Differential Revision: https://reviews.llvm.org/D100916	2021-04-21 18:10:40 -07:00
Craig Topper	023b243d1d	[RISCV] Cleanup up the spec version references around fmaxnum/fminnum. This previously made references to 2.3-draft which was a short lived version number in 2017. It was replaced by date based versions leading up to ratification. This patch uses the latest ratified version number and just says what the behavior is. Nothing here is in flux. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D100878	2021-04-21 14:50:29 -07:00
Craig Topper	a8822caa1b	[RISCV] Temporary in vmsge(u).vx pseudo instructions can't be V0. This was checked in some asserts, but not enforced by the instruction matching. There's still a second bug that we don't check that vt and vd are different registers, but that will require custom checking. Differential Revision: https://reviews.llvm.org/D100928	2021-04-21 14:50:29 -07:00
Simon Pilgrim	a511b55cfd	[X86][SSE] getFauxShuffleMask - don't decode OR(SHUFFLE,SHUFFLE) containing UNDEFs. (PR50049) PR50049 demonstrated an infinite loop between OR(SHUFFLE,SHUFFLE) <-> BLEND(SHUFFLE,SHUFFLE) patterns. The UNDEF elements were allowing a combined shuffle mask to be widened which lost the undef element, resulting us needing to use the BLEND pattern (as the undef element would need to be zero for the OR pattern). But then bitcast folds would re-expose the undef element allowing us to use OR again.....	2021-04-21 18:47:00 +01:00
Stanislav Mekhanoshin	f9d0d0d7e0	[AMDGPU] Lower regbanks reassign threshold to 15000 Let it work on a very small kernels only. Measurements showed the performance benefit is not worth the compile time. Differential Revision: https://reviews.llvm.org/D100904	2021-04-21 08:34:11 -07:00
dfukalov	a8b35e0f52	[TTI] NFC: Change getVectorSplitCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D100952	2021-04-21 17:32:02 +03:00
Anirudh Prasad	8f6185c713	[AsmParser][ms][X86] Fix possible misbehaviour in parsing of special tokens at start of string. - Previously, https://reviews.llvm.org/D72680 introduced a new attribute called `AllowSymbolAtNameStart` (in relation to the MAsmParser changes) in `MCAsmInfo.h` which (according to the comment in the header) allows the following behaviour: ``` /// This is true if the assembler allows $ @ ? characters at the start of /// symbol names. Defaults to false. ``` - However, the usage of this field in AsmLexer.cpp doesn't seem completely accurate* for a couple of reasons. ``` default: if (MAI.doesAllowSymbolAtNameStart()) { // Handle Microsoft-style identifier: [a-zA-Z_$.@?][a-zA-Z0-9_$.@#?]* if (!isDigit(CurChar) && isIdentifierChar(CurChar, MAI.doesAllowAtInName(), AllowHashInIdentifier)) return LexIdentifier(); } ``` 1. The Dollar and At tokens, when occurring at the start of the string, are treated as separate tokens (AsmToken::Dollar and AsmToken::At respectively) and not lexed as an Identifier. 2. I'm not too sure why `MAI.doesAllowAtInName()` is used when `AllowAtInIdentifier` could be used. For X86 platforms, afaict, this shouldn't be an issue, since the `CommentString` attribute isn't "@". (alternatively the call to the setter can be set anywhere else as needed). The `AllowAtInName` does have an additional important meaning, but in the context of AsmLexer, shouldn't mean anything different compared to `AllowAtInIdentifier` My proposal is the following: - Introduce 3 new fields called `AllowQuestionTokenAtStartOfString`, `AllowDollarTokenAtStartOfString` and `AllowAtTokenAtStartOfString` in MCAsmInfo.h which will encapsulate the previously documented behaviour of "allowing $, @, ? characters at the start of symbol names") - Introduce these fields where "$", "@" are lexed, and treat them as identifiers depending on whether `Allow[Dollar\|At]TokenAtStartOfString` is set. - For the sole case of "?", append it to the existing logic for treating a "default" token as an Identifier. z/OS (HLASM) will also make use of some of these fields in follow up patches. completely accurate* - This was based on the comments and the intended behaviour the code. I might have completely misinterpreted it, and if that is the case my sincere apologies. We can close this patch if necessary, if there are no changes to be made :) Depends on https://reviews.llvm.org/D99374 Reviewed By: Jonathan.Crowther Differential Revision: https://reviews.llvm.org/D99889	2021-04-21 10:21:09 -04:00
Caroline Concatto	ca9b7e2e2f	[AArch64][SVE] Fix crash with icmp+select This patch changes the lowering of SELECT_CC from Legal to Expand for scalable vector and adds support for scalable vectors in performSelectCombine. When selecting the nodes to lower in visitSELECT it checks if it is possible to use SELECT_CC in cases where SETCC is followed by SELECT. visistSELECT checks if SELECT_CC is legal or custom to replace SELECT by SELECT_CC. SELECT_CC used to be legal for scalable vector, so the node changes to SELECT_CC. This used to crash the compiler as there is no support for SELECT_CC with scalable vectors. So now the compiler lowers to VSELECT instead of SELECT_CC. Differential Revision: https://reviews.llvm.org/D100485	2021-04-21 14:16:27 +01:00
Matt Arsenault	70ab76a81b	AMDGPU: Fix indirect tail calls Fix a selection error on uniform callees, and use a regular call if divergent.	2021-04-21 09:15:24 -04:00
Fraser Cormack	3f02d26943	[RISCV] Further fixes for RVV stack offset computation This patch fixes a case missed out by D100574, in which RVV scalable stack offset computations may require three live registers in the case where the offset's fixed component is 12 bits or larger and has a scalable component. Instead of adding an additional emergency spill slot, this patch further optimizes the scalable stack offset computation sequences to reduce register usage. By emitting the sequence to compute the scalable component before the fixed component, we can free up one scratch register to be reallocated by the sequence for the fixed component. Doing this saves one register and thus one additional emergency spill slot. Compare: $x5 = LUI 1 $x1 = ADDIW killed $x5, -1896 $x1 = ADD $x2, killed $x1 $x5 = PseudoReadVLENB $x6 = ADDI $x0, 50 $x5 = MUL killed $x5, killed $x6 $x1 = ADD killed $x1, killed $x5 versus: $x5 = PseudoReadVLENB $x1 = ADDI $x0, 50 $x5 = MUL killed $x5, killed $x1 $x1 = LUI 1 $x1 = ADDIW killed $x1, -1896 $x1 = ADD $x2, killed $x1 $x1 = ADD killed $x1, killed $x5 Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D100847	2021-04-21 10:51:07 +01:00
David Sherwood	57ca65e21e	[AArch64] Add instruction costs for FP_TO_UINT and FP_TO_SINT with half types We were missing some instruction costs when converting vectors of floating point half types into integers, so I've added those here. I also manually generated assembly code for each FP->int case and looked at the number of instructions generated, which meant adjusting some of the existing costs too. I've updated an existing test to reflect the new costs: Analysis/CostModel/AArch64/sve-fptoi.ll Differential Revision: https://reviews.llvm.org/D99935	2021-04-21 09:39:45 +01:00
Zakk Chen	ad0fe5db2f	[RISCV][MC] Mask load should not have VMConstraint. Add a test, dest register could be v0. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D100825	2021-04-21 15:21:37 +08:00
Serge Pavlov	d20a2376d8	[RISCV] Introduce floating point control and state registers New registers FRM, FFLAGS and FCSR was defined. They represent corresponding system registers. The new registers are necessary to properly order floating point instructions in non-default modes. Differential Revision: https://reviews.llvm.org/D99083	2021-04-21 12:55:30 +07:00
Zi Xuan Wu	ca31b43ae8	[NFC][CSKY] Resort the instruction description in td Resort the instruction description in td to make it easy to upstream more instructions and add predicts later.	2021-04-21 12:36:07 +08:00
Sam Clegg	103956170b	[WebAssembly] Update README. NFC. This is just a cleanup of the very high level stuff. I'm sure there is more to update here but I'll leave that to others and/or a followup. Differential Revision: https://reviews.llvm.org/D100888	2021-04-20 16:59:08 -07:00
Philip Reames	4824d876f0	Revert "Allow invokable sub-classes of IntrinsicInst" This reverts commit `d87b9b81cc`. Post commit review raised concerns, reverting while discussion happens.	2021-04-20 15:38:38 -07:00
Philip Reames	d87b9b81cc	Allow invokable sub-classes of IntrinsicInst It used to be that all of our intrinsics were call instructions, but over time, we've added more and more invokable intrinsics. According to the verifier, we're up to 8 right now. As IntrinsicInst is a sub-class of CallInst, this puts us in an awkward spot where the idiomatic means to check for intrinsic has a false negative if the intrinsic is invoked. This change switches IntrinsicInst from being a sub-class of CallInst to being a subclass of CallBase. This allows invoked intrinsics to be instances of IntrinsicInst, at the cost of requiring a few more casts to CallInst in places where the intrinsic really is known to be a call, not an invoke. After this lands and has baked for a couple days, planned cleanups: Make GCStatepointInst a IntrinsicInst subclass. Merge intrinsic handling in InstCombine and use idiomatic visitIntrinsicInst entry point for InstVisitor. Do the same in SelectionDAG. Do the same in FastISEL. Differential Revision: https://reviews.llvm.org/D99976	2021-04-20 15:03:49 -07:00
Sam Clegg	d2de2d1724	[WebAssembly] Remove unused known_gcc_test_failures.txt. NFC Differential Revision: https://reviews.llvm.org/D100887	2021-04-20 14:07:25 -07:00
Alexey Bataev	673e2f1b70	[COST][AARCH64] Improve cost of reverse shuffles for AArch64. Introduced the cost of thre reverse shuffles for AArch64, currently just copied the costs for PermuteSingleSrc. Differential Revision: https://reviews.llvm.org/D100871	2021-04-20 13:47:56 -07:00
Jon Roelofs	167da6c9e8	[AArch64][GlobalISel] Clarify fallback debug print ... to only print when that fallback actually happens.	2021-04-20 12:41:14 -07:00
Thomas Lively	693d767c60	[WebAssembly] More codegen for f64x2.convert_low_i32x4_{s,u} `af7925b4dd` added a custom DAG combine for recognizing fp-to-ints of extract_subvectors that could be lowered to f64x2.convert_low_i32x4_{s,u} instructions. This commit extends the combines to recognize equivalent extract_subvectors of fp-to-ints as well. Differential Revision: https://reviews.llvm.org/D100790	2021-04-20 12:37:13 -07:00
Simon Pilgrim	2a419a0b99	[X86][SSE] combineX86ShuffleChain - check if we're blending with zero into already zero elements Add a SelectionDAG::MaskedElementsAreZero helper that wraps SelectionDAG::MaskedValueIsZero testing for entirely zero vector elements	2021-04-20 17:09:49 +01:00
Jay Foad	ec8c61efdf	[AMDGPU] Allow multiple uses of the same literal In GFX10 VOP3 can have a literal, which opens up the possibility of two operands using the same literal value, which is allowed and only counts as one use of the constant bus. AMDGPUAsmParser::validateConstantBusLimitations already knew about this but SIInstrInfo::verifyInstruction did not. Differential Revision: https://reviews.llvm.org/D100770	2021-04-20 16:44:01 +01:00
Ahmed Bougacha	a0573b6c10	[AArch64] Bump apple-latest CPU alias to apple-a14.	2021-04-20 08:41:04 -07:00
Ahmed Bougacha	a8a3a43792	[AArch64] Add apple-m1 CPU, and default to it for macOS. apple-m1 has the same level of ISA support as apple-a14, so this is a straightforward mechanical change. However, that also means this inherits apple-a14's v8.5a+nobti quirkiness. rdar://68287159	2021-04-20 08:41:04 -07:00
David Green	21a8b9d9e9	[ARM] Limit PerformExtractEltToVMOVRRD to when f64 is legal. The generic SoftFloatVectorExtract.ll test was failing when run on arm machines, as it tries to create a f64 under soft float. Limit the transform to when f64 is legal. Also add a missing override, as reported in D100244.	2021-04-20 16:24:36 +01:00
Matt Arsenault	1cb8a9d595	AMDGPU/GlobalISel: Fix uitofp/sitofp with non-power-of-2 integers	2021-04-20 11:13:29 -04:00
Bradley Smith	b8b075d8d7	[AArch64][SVE] Lower MULHU/MULHS nodes to umulh/smulh instructions Mark MULHS/MULHU nodes as legal for both scalable and fixed SVE types, and lower them to the appropriate SVE instructions. Additionally now that the MULH nodes are legal, integer divides can be expanded into a more performant code sequence. Differential Revision: https://reviews.llvm.org/D100487	2021-04-20 15:18:06 +01:00
David Green	48cef1fa8e	[ARM] Create VMOVRRD from adjacent vector extracts This adds a combine for extract(x, n); extract(x, n+1) -> VMOVRRD(extract x, n/2). This allows two vector lanes to be moved at the same time in a single instruction, and thanks to the other VMOVRRD folds we have added recently can help reduce the amount of executed instructions. Floating point types are very similar, but will include a bitcast to an integer type. This also adds a shouldRewriteCopySrc, to prevent copy propagation from DPR to SPR, which can break as not all DPR regs can be extracted from directly. Otherwise the machine verifier is unhappy. Differential Revision: https://reviews.llvm.org/D100244	2021-04-20 15:15:43 +01:00
Cullen Rhodes	f166d0db71	[AArch64][AsmParser] NFC: Remove unused ExtendOp struct Left over from `2625a993f9` when extend and shift were merged.	2021-04-20 13:45:09 +00:00
Sebastian Neubauer	4897effb14	[AMDGPU] Add TransVALU to gfx10 Instructions on the transcendental unit are executed in parallel to the normal VALU, so add this as an extra resource. This doesn't seem to have any effect, but it should be more correct. Differential Revision: https://reviews.llvm.org/D100123	2021-04-20 15:34:43 +02:00
Jay Foad	2aea830ec4	[AMDGPU] Use if instead of foreach in a few places. NFC.	2021-04-20 14:20:30 +01:00
Nemanja Ivanovic	03e7fefff8	[PowerPC] Canonicalize shuffles on big endian targets as well Extend shuffle canonicalization and conversion of shuffles fed by vectorized scalars to big endian subtargets. For big endian subtargets, loads and direct moves of scalars into vector registers put the data in the correct element for SCALAR_TO_VECTOR if the data type is 8 bytes wide. However, if the data type is narrower, the value still ends up in the wrong place - althouth a different wrong place than on little endian targets. This patch extends the combine that keeps values where they are if they feed a shuffle to big endian targets. Differential revision: https://reviews.llvm.org/D100478	2021-04-20 07:29:47 -05:00
Jay Foad	edea476142	[AMDGPU] Use simpler alternatives to !foldl. NFC.	2021-04-20 12:59:04 +01:00
hsmahesha	840c4e4e90	[AMDGPU] Re-arrange ds_read/ds_write ISel pattern for better readability. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D100773	2021-04-20 16:17:15 +05:30
Ben Shi	30e2c7be99	[RISCV] Refactor an optimization of addition with immediate Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100769	2021-04-20 18:04:25 +08:00
Joe Ellis	c91cd4f3bb	[AArch64][SVE][InstCombine] Replace last{a,b} intrinsics with extracts... when the predicate used by last{a,b} specifies a known vector length. For example: aarch64_sve_lasta(VL1, D) -> extractelement(D, #1) aarch64_sve_lastb(VL1, D) -> extractelement(D, #0) Co-authored-by: Paul Walker <paul.walker@arm.com> Differential Revision: https://reviews.llvm.org/D100476	2021-04-20 10:01:33 +00:00
Fraser Cormack	b4a358a7ba	[RISCV] Fix missing emergency slots for scalable stack offsets This patch adds an additional emergency spill slot to RVV code. This is required as RVV stack offsets may require an additional register to compute. This patch includes an optimization by @HsiangKai <kai.wang@sifive.com> to reduce the number of registers required for the computation of stack offsets from 3 to 2. Otherwise we'd need two additional emergency spill slots. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D100574	2021-04-20 09:59:41 +01:00
Qiu Chaofan	2432d80d3b	[PowerPC] Use mtvsrdd to put callee-saved GPR into VSR This patch exploits mtvsrdd instruction (available in ISA3.0+) to save two callee-saved GPR registers into a single VSR, making it more efficient. Reviewed By: jsji, nemanjai Differential Revision: https://reviews.llvm.org/D62565	2021-04-20 16:43:24 +08:00
Jay Foad	b22721f01a	[AMDGPU] GCNDPPCombine: don't shrink V_ADD_CO_U32 if carry out is used Don't shrink VOP3 instructions if there are any uses of a carry-out operand, because the shrunken form of the instruction would write the carry-out to vcc instead of to a virtual register. Differential Revision: https://reviews.llvm.org/D100760	2021-04-20 09:17:52 +01:00
Qiu Chaofan	b820339752	[PowerPC] Support f128 under VSX This patch is the last one in backend to support fp128 type in pre-POWER9 subtargets with VSX, removing temporary option and updating remaining tests. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D92374	2021-04-20 15:49:52 +08:00
Zi Xuan Wu	4bb60c285c	[CSKY 6/n] Add support branch and symbol series instruction This patch adds basic CSKY branch instructions and symbol address series instructions. Those two kinds of instruction have relationship between each other, and it involves much work about Fixups. For now, basic instructions are enabled except for disassembler support. We would support to generate basic codegen asm firstly and delay disassembler work later. Differential Revision: https://reviews.llvm.org/D95029	2021-04-20 15:36:49 +08:00
Zi Xuan Wu	4216389c26	[CSKY 5/n] Add support for all CSKY basic integer instructions except for branch series This patch adds basic CSKY integer instructions except for branch series such as bsr, br. It mainly includes basic ALU, load & store, compare and data move instructions. Branch series instructions need handle complex symbol operand as following patch later. Differential Revision: https://reviews.llvm.org/D94007	2021-04-20 15:36:49 +08:00
Zi Xuan Wu	8ba622bae1	[CSKY 4/n] Add basic CSKYAsmParser and CSKYInstPrinter This basic parser will handle basic instructions with register or immediate operands. With the addition of CSKYInstPrinter, we can now make use of lit tests. Differential Revision: https://reviews.llvm.org/D93798	2021-04-20 15:36:49 +08:00
Zakk Chen	d5fa71e9ec	[RISCV] Handle PseudoVRELOAD and PseudoVSPILL in getInstSizeInBytes. It's necessary to calculate correct instruction size because PseudoVRELOAD and PseudoSPILL will be expanded into multiple instructions. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D100702	2021-04-19 22:30:03 -07:00
Jun Ma	5c6ac3b4a2	[AArch64][SVE] Combine add and index_vector This patch tries to combine pattern add(index_vector(zero, step), dup(X)) into index_vector(X, step) TestPlan: check-llvm Differential Revision: https://reviews.llvm.org/D100107	2021-04-20 11:38:37 +08:00
Min-Yih Hsu	7ac461f6f7	[M68k] Put M68kDesc as the direct library dependency for disassembler M68kDisassembler should put M68kDesc as its direct library dependency since it uses logics releated to code beads Otherwise the build will fail when building LLVM libraries as shared objects (building LLVM libraries statically won't have this problem though)	2021-04-19 15:56:24 -07:00
Ricky Taylor	2221185776	[M68k] Implement Disassembler This is an implementation of a disassembler for M68k. Differential Revision: https://reviews.llvm.org/D98540	2021-04-19 22:24:12 +01:00
Ricky Taylor	6de262827c	[M68k] Change printing of absolute memory references This also includes PC-relative addresses since they are still referenced as absolute addresses in assembly and converted to relative addresses by the assembler. This changes, for example: - `bra #-2` -> `bra $100` - `jsr #16` -> `jsr $10` Differential Revision: https://reviews.llvm.org/D100697	2021-04-19 22:24:12 +01:00
David Penry	78a871abf7	[ARM] Use ProcResGroup in Cortex-M7 scheduling model Used to model structural hazards on FP issue, where some instructions take up 2 issue slots and others one as well as similar structural hazards on load issue, where some instructions take up two load lanes and others one. Differential Revision: https://reviews.llvm.org/D98977	2021-04-19 21:23:05 +01:00
Thomas Lively	e657c84fa1	[WebAssembly] Use v128.const instead of splats for constants We previously used splats instead of v128.const to materialize vector constants because V8 did not support v128.const. Now that V8 supports v128.const, we can use v128.const instead. Although this increases code size, it should also increase performance (or at least require fewer engine-side optimizations), so it is an appropriate change to make. Differential Revision: https://reviews.llvm.org/D100716	2021-04-19 12:43:59 -07:00
Jinsong Ji	d88d8c5b86	[PowerPC] Disable relative lookup table converter pass for AIX XCOFF hasn't implemented lowerRelativeReference. So we need to disable new pass introduced by https://reviews.llvm.org/D94355 for AIX for now. Reviewed By: gulfem Differential Revision: https://reviews.llvm.org/D100584	2021-04-19 19:28:11 +00:00
madhur13490	6a4d9cb7e0	[AMDGPU] Remove error check for indirect calls and add missing queue-ptr This patch removes -fixed-abi check for indirect calls and also adds queue-ptr which is required for indirect calls to work. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D100633	2021-04-20 00:35:17 +05:30
Pavel Iliin	2ec16103c6	[AArch64] Peephole rule to remove redundant cmp after cset. Comparisons to zero or one after cset instructions can be safely removed in examples like: cset w9, eq cset w9, eq cmp w9, #1 ---> <removed> b.ne .L1 b.ne .L1 cset w9, eq cset w9, eq cmp w9, #0 ---> <removed> b.ne .L1 b.eq .L1 Peephole optimization to detect suitable cases and get rid of that comparisons added. Differential Revision: https://reviews.llvm.org/D98564	2021-04-19 19:58:38 +01:00
Craig Topper	87afefcd22	[RISCV] Fix mistake in comment. NFC	2021-04-19 11:15:32 -07:00
Craig Topper	7ed01a420a	[RISCV] Pad v4i1/v2i1/v1i1 stores with 0s to make a full byte. As noted in the FIXME there's a sort of agreement that the any extra bits stored will be 0. The generated code is pretty terrible. I was really hoping we could use a tail undisturbed trick, but tail undisturbed no longer applies to masked destinations in the current draft spec. Fingers crossed that it isn't common to do this. I doubt IR from clang or the vectorizer would ever create this kind of store. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D100618	2021-04-19 11:05:18 -07:00
Jessica Paquette	65f257a215	[AArch64][GlobalISel] Implement custom legalization for s32 and s64 G_CTPOP This is a partial port of AArch64TargetLowering::LowerCTPOP. This custom lowering tries to uses NEON instructions to give a more efficient CTPOP lowering when possible. In the non-NEON/noimplicitfloat case, this should use the generic lowering (see: https://godbolt.org/z/GcaPvWe4x). I think that's worth implementing after implementing the widening code for s16/s8 though. Differential Revision: https://reviews.llvm.org/D100399	2021-04-19 10:56:02 -07:00
Nick Desaulniers	c440b97d89	[TargetLowering] move "o" and "X" constraint handling to base class These constraints are machine agnostic; there's no reason to handle these per-arch. If arches don't support these constraints, then they will fail elsewhere during instruction selection. We don't need virtual calls to look these up; TargetLowering::getInlineAsmMemConstraint should only be overridden by architectures with additional unique memory constraints. Reviewed By: echristo, MaskRay Differential Revision: https://reviews.llvm.org/D100416	2021-04-19 10:53:31 -07:00
Jessica Paquette	91bbb914e0	[AArch64][GlobalISel] Regbankselect + select @llvm.aarch64.neon.uaddlv It turns out we actually import a bunch of selection code for intrinsics. The imported code checks that the register banks on the G_INTRINSIC instruction are correct. If so, it goes ahead and selects it. This adds code to AArch64RegisterBankInfo to allow us to correctly determine register banks on intrinsics which have known register bank constraints. For now, this only handles @llvm.aarch64.neon.uaddlv. This is necessary for porting AArch64TargetLowering::LowerCTPOP. Also add a utility for getting the intrinsic ID from a G_INTRINSIC instruction. This seems a little nicer than having to know about how intrinsic instructions are structured. Differential Revision: https://reviews.llvm.org/D100398	2021-04-19 10:47:49 -07:00
Jay Foad	a02aa91313	[AMDGPU] GCNDPPCombine: simplify API of isShrinkable. NFC.	2021-04-19 14:20:46 +01:00
Jay Foad	ef443390a9	[AMDGPU] Remove MachineDCE after SIFoldOperands Remove the MachineDCE pass after the first SIFoldOperands pass now that SIFoldOperands deletes its own dead instructions. Reapply after fixing dependent change D100188. Differential Revision: https://reviews.llvm.org/D100189	2021-04-19 12:08:02 +01:00
Jay Foad	323ef0eb45	[AMDGPU] SIFoldOperands: eagerly erase dead REG_SEQUENCEs This is fairly cheap to implement and means less work for future passes like MachineDCE. Reapply with a fix for using InstToErase after it had been erased. Differential Revision: https://reviews.llvm.org/D100188	2021-04-19 12:05:41 +01:00
Cullen Rhodes	f0bc2782f2	[TTI] NFC: Remove unused 'OptSize' parameter from shouldMaximizeVectorBandwidth Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D100377	2021-04-19 11:01:34 +00:00
Dmitry Preobrazhensky	bcc29e0fcf	[AMDGPU][MC] Corrected parsing of carry in/out operands in VOP3 Disabled constants as carry in/out operands. See bug 48711. Differential Revision: https://reviews.llvm.org/D100642	2021-04-19 13:42:31 +03:00
Roman Lebedev	df9597cf5a	[X86][CostModel] X86TTIImpl::getShuffleCost(): subvector insertions are cheap This is similar to the subvector extractions, except that the 0'th subvector isn't free to insert, because we generally don't know whether or not the upper elements need to be preserved: https://godbolt.org/z/rsxP5W4sW This is needed to avoid regressions in D100684 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D100698	2021-04-19 13:24:58 +03:00
Fraser Cormack	c9a93c3e01	[RISCV] Lower vector shuffles to vrgather operations This patch extends the lowering of RVV fixed-length vector shuffles to avoid the default stack expansion and instead lower to vrgather instructions. For "permute"-style shuffles where one vector is swizzled, we can lower to one vrgather. For shuffles involving two vector operands, we lower to one unmasked vrgather (or splat, where appropriate) followed by a masked vrgather which blends in the second half. On occasion, when it's not possible to create a legal BUILD_VECTOR for the indices, we use vrgatherei16 instructions with 16-bit index types. For 8-bit element vectors where we may have indices over 255, we have a fairly blunt fallback to the stack expansion to avoid custom-splitting of the vector types. To enable the selection of masked vrgather instructions, this patch extends the various RISCVISD::VRGATHER nodes to take a passthru operand. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100549	2021-04-19 11:13:13 +01:00
Yaxun (Sam) Liu	3597f02fd5	[AMDGPU] Add GlobalDCE before internalization pass The internalization pass only internalizes global variables with no users. If the global variable has some dead user, the internalization pass will not internalize it. To be able to internalize global variables with dead users, a global dce pass is needed before the internalization pass. This patch adds that. Reviewed by: Artem Belevich, Matt Arsenault Differential Revision: https://reviews.llvm.org/D98783	2021-04-17 11:25:25 -04:00
Serge Guelton	d6de1e1a71	Normalize interaction with boolean attributes Such attributes can either be unset, or set to "true" or "false" (as string). throughout the codebase, this led to inelegant checks ranging from if (Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true") to if (Fn->hasAttribute("no-jump-tables") && Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true") Introduce a getValueAsBool that normalize the check, with the following behavior: no attributes or attribute set to "false" => return false attribute set to "true" => return true Differential Revision: https://reviews.llvm.org/D99299	2021-04-17 08:17:33 +02:00
Nemanja Ivanovic	ff769dd111	[PowerPC] Minor improvement for insert_vector_elt codegen For v2f64, all VSX subtargets can insert an element with a single XXPERMDI.	2021-04-16 18:52:37 -05:00
Joe Nash	a0ed70abde	[AMDGPU] Remove redundant field from DPP8 def These lines set the value to what it already was, so they are redundant. NFC Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D100664 Change-Id: Ibf6f27d50a7fa1f76c127f01b799821378bfd3b3	2021-04-16 16:23:52 -04:00
Joe Nash	919236e608	[AMDGPU] NFC, Comment in disassembler for dpp8 Gives reasoning for convertDPP8. Also corrects typo in Operand type comment. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D100665 Change-Id: I33ff269db8072d83e5e0ecdbfb731d6000fc26c4	2021-04-16 16:21:47 -04:00
Thomas Lively	5c729750a6	[WebAssembly] Remove saturating fp-to-int target intrinsics Use the target-independent @llvm.fptosi and @llvm.fptoui intrinsics instead. This includes removing the instrinsics for i32x4.trunc_sat_zero_f64x2_{s,u}, which are now represented in IR as a saturating truncation to a v2i32 followed by a concatenation with a zero vector. Differential Revision: https://reviews.llvm.org/D100596	2021-04-16 12:11:20 -07:00
Christudasan Devadasan	97618522dc	[AMDGPU] Remove dead dcode (NFC).	2021-04-16 23:03:31 +05:30
Joe Nash	7cc4a02fa2	[AMDGPU] Refactor VOP3P Profile and AsmParser, NFC Refactors VOP3P tablegen and the AsmParser for VOP3P for better extensibility. NFC intended Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D100602 Change-Id: I038e3a772ac348bb18979cdf3e3ae2e9476dd411	2021-04-16 13:06:50 -04:00
Malhar Jajoo	093f1828e5	[ARM] Prevent phi-node-elimination from generating copy above t2WhileLoopStartLR This patch prevents phi-node-elimination from generating a COPY operation for the register defined by t2WhileLoopStartLR, as it is a terminator that defines a value. This happens because of the presence of phi-nodes in the loop body (the Preheader of which is the block containing the t2WhileLoopStartLR). If this is not done, the COPY is generated above/before the terminator (t2WhileLoopStartLR here), and since it uses the value defined by t2WhileLoopStartLR, MachineVerifier throws a 'use before define' error. This essentially adds on to the change in differential D91887/D97729. Differential Revision: https://reviews.llvm.org/D100376	2021-04-16 16:45:07 +01:00
Roman Lebedev	b06c55a698	[X86][CostModel] Fix cost model for non-power-of-two vector load/stores Sometimes LV has to produce really wide vectors, and sometimes they end up being not powers of two. As it can be seen from the diff, the cost computation is currently completely non-sensical in those cases. Instead of just scalarizing everything, split/factorize the wide vector into a number of subvectors, each one having a power-of-two elements, recurse to get the cost of op on this subvector. Also, check how we'd legalize this subvector, and if the legalized type is scalar, also account for the scalarization cost. Note that for sub-vector loads, we might be able to do better, when the vectors are properly aligned. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D100099	2021-04-16 15:30:57 +03:00
David Green	00a6045473	[ARM] Combine sub 0, csinc X, Y, CC -> csinv -X, Y, CC Combine sub 0, csinc X, Y, CC to csinv -X, Y, CC providing that the negation of X is cheap, currently just handling constants. This comes up during the splat of an i1 to a predicate, where we now generate csetm, as opposed to cset; rsb. Differential Revision: https://reviews.llvm.org/D99940	2021-04-16 11:52:31 +01:00
Nick Desaulniers	bb7016f8f5	[Aarch64] handle "o" inline asm memory constraints This Linux kernel is making use of this inline asm constraint which is causing an ICE. PR49956 Link: https://github.com/ClangBuiltLinux/linux/issues/1348 Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D100412	2021-04-15 23:36:21 -07:00
Jim Lin	2893570e86	[RISCV] Don't emit save-restore call if function is a interrupt handler It has to save all caller-saved registers before a call in the handler. So don't emit a call that save/restore registers. Reviewed By: simoncook, luismarques, asb Differential Revision: https://reviews.llvm.org/D100532	2021-04-16 12:54:47 +08:00
hsmahesha	099dcb68a6	[AMDGPU] Refactor ds_read/ds_write related select code for better readability. Part of the code related to ds_read/ds_write ISel is refactored, and the corresponding comment is re-written for better readability, which would help while implementing any future ds_read/ds_write ISel related modifications. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D100300	2021-04-16 08:24:00 +05:30
Momchil Velikov	f9d932e673	[clang][AArch64] Correctly align HFA arguments when passed on the stack When we pass a AArch64 Homogeneous Floating-Point Aggregate (HFA) argument with increased alignment requirements, for example struct S { __attribute__ ((__aligned__(16))) double v[4]; }; Clang uses `[4 x double]` for the parameter, which is passed on the stack at alignment 8, whereas it should be at alignment 16, following Rule C.4 in AAPCS (https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#642parameter-passing-rules) Currently we don't have a way to express in LLVM IR the alignment requirements of the function arguments. The align attribute is applicable to pointers only, and only for some special ways of passing arguments (e..g byval). When implementing AAPCS32/AAPCS64, clang resorts to dubious hacks of coercing to types, which naturally have the needed alignment. We don't have enough types to cover all the cases, though. This patch introduces a new use of the stackalign attribute to control stack slot alignment, when and if an argument is passed in memory. The attribute align is left as an optimizer hint - it still applies to pointer types only and pertains to the content of the pointer, whereas the alignment of the pointer itself is determined by the stackalign attribute. For byval arguments, the stackalign attribute assumes the role, previously perfomed by align, falling back to align if stackalign` is absent. On the clang side, when passing arguments using the "direct" style (cf. `ABIArgInfo::Kind`), now we can optionally specify an alignment, which is emitted as the new `stackalign` attribute. Patch by Momchil Velikov and Lucas Prates. Differential Revision: https://reviews.llvm.org/D98794	2021-04-15 22:58:14 +01:00
Stanislav Mekhanoshin	13015ebd6f	[AMDGPU] Factor out predicate FmaakFmamkF32Insts Differential Revision: https://reviews.llvm.org/D100409	2021-04-15 12:29:16 -07:00

1 2 3 4 5 ...

62465 Commits