llvm-project

Commit Graph

Author	SHA1	Message	Date
Liu, Chen3	b70e02a7e7	[X86][NFC] Move instruction selection of the x86_tdpb[s,u]d_internal and x86_tilezero_internal to X86InstrAMX.td Differential Revision: https://reviews.llvm.org/D97997	2021-03-09 21:27:39 +08:00
Simon Pilgrim	87d5b34c24	[X86] X86ISelDAGToDAG.cpp - include cstdint instead of stdint.h NFCI. Fixes clang-tidy warning	2021-03-05 15:58:20 +00:00
Simon Pilgrim	f11f86c114	[X86] X86DAGToDAGISel::Select - merge X86::TEST load bitsize checks. NFCI.	2021-03-05 15:58:20 +00:00
Liu, Chen3	4bc7c8631a	[X86] Support amx-bf16 intrinsic. Adding support for intrinsics of AMX-BF16. This patch alse fix a bug that AMX-INT8 instructions will be selected with wrong predicate. Differential Revision: https://reviews.llvm.org/D97358	2021-02-25 09:06:48 +08:00
Liu, Chen3	f8b9035aae	[X86] Support amx-int8 intrinsic. Adding support for intrinsics of TDPBSUD/TDPBUSD/TDPBUUD. Differential Revision: https://reviews.llvm.org/D97259	2021-02-23 17:08:05 +08:00
Wang, Pengfei	a5d9e0c79b	[X86] Fix tile config register spill issue. This is an optimized approach for D94155. Previous code build the model that tile config register is the user of each AMX instruction. There is a problem for the tile config register spill. When across function, the ldtilecfg instruction may be inserted on each AMX instruction which use tile config register. This cause all tile data register clobber. To fix this issue, we remove the model of tile config register. Instead, we analyze the AMX instructions between one call to another. We will insert ldtilecfg after the first call if we find any AMX instructions. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D95136	2021-01-30 12:53:57 +08:00
Craig Topper	74784a5aa4	[X86] In shrinkAndImmediate, place the new constant into the topological sort. Revert the change to use APInt::isSignedIntN from `5ff5cf8e05`. Its clear that the games we were playing to avoid the topological sort aren't working. So just fix it once and for all. Fixes PR48888.	2021-01-26 13:18:04 -08:00
Luo, Yuanke	64132f541e	Revert "[X86][AMX] Fix tile config register spill issue." This reverts commit `20013d02f3`.	2021-01-21 18:11:43 +08:00
Luo, Yuanke	20013d02f3	[X86][AMX] Fix tile config register spill issue. Previous code build the model that tile config register is the user of each AMX instruction. There is a problem for the tile config register spill. When across function, the ldtilecfg instruction may be inserted on each AMX instruction which use tile config register. This cause all tile data register clobber. To fix this issue, we remove the model of tile config register. We analyze the regmask of call instruction and insert ldtilecfg if there is any tile data register live across the call. Inserting the sttilecfg before the call is unneccessary, because the tile config doesn't change and we can just reload the config. Besides we also need check tile config register interference. Since we don't model the config register we should check interference from the ldtilecfg to each tile data register def. ldtilecfg / \ BB1 BB2 / \ call BB3 / \ %1=tileload %2=tilezero We can start from the instruction of each tile def, and backward to ldtilecfg. If there is any call instruction, and tile data register is not preserved, we should insert ldtilecfg after the call instruction. Differential Revision: https://reviews.llvm.org/D94155	2021-01-21 16:01:50 +08:00
Luo, Yuanke	08665b1805	Support tilezero intrinsic and c interface for AMX. Differential Revision: https://reviews.llvm.org/D92837	2020-12-31 13:24:57 +08:00
Luo, Yuanke	981a0bd858	[X86] Add x86_amx type for intel AMX. The x86_amx is used for AMX intrisics. <256 x i32> is bitcast to x86_amx when it is used by AMX intrinsics, and x86_amx is bitcast to <256 x i32> when it is used by load/store instruction. So amx intrinsics only operate on type x86_amx. It can help to separate amx intrinsics from llvm IR instructions (+-*/). Thank Craig for the idea. This patch depend on https://reviews.llvm.org/D87981. Differential Revision: https://reviews.llvm.org/D91927	2020-12-30 13:52:13 +08:00
Harald van Dijk	09d0e7a7c1	[X86] Avoid %fs:(%eax) references in x32 mode The ABI explains that %fs:(%eax) zero-extends %eax to 64 bits, and adds that the TLS base address, but that the TLS base address need not be at the start of the TLS block, TLS references may use negative offsets. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D93158	2020-12-16 22:39:57 +00:00
Luo, Yuanke	e52bc1d2bb	[X86] Add chain in ISel for x86_tdpbssd_internal intrinsic.	2020-12-12 21:14:38 +08:00
Luo, Yuanke	f80b29878b	[X86] AMX programming model. This patch implements amx programming model that discussed in llvm-dev (http://lists.llvm.org/pipermail/llvm-dev/2020-August/144302.html). Thank Hal for the good suggestion in the RA. The fast RA is not in the patch yet. This patch implemeted 7 components. 1. The c interface to end user. 2. The AMX intrinsics in LLVM IR. 3. Transform load/store <256 x i32> to AMX intrinsics or split the type into two <128 x i32>. 4. The Lowering from AMX intrinsics to AMX pseudo instruction. 5. Insert psuedo ldtilecfg and build the def-use between ldtilecfg to amx intruction. 6. The register allocation for tile register. 7. Morph AMX pseudo instruction to AMX real instruction. Change-Id: I935e1080916ffcb72af54c2c83faa8b2e97d5cb0 Differential Revision: https://reviews.llvm.org/D87981	2020-12-10 17:01:54 +08:00
Craig Topper	5ff5cf8e05	[X86] Use APInt::isSignedIntN instead of isIntN for 64-bit ANDs in X86DAGToDAGISel::IsProfitableToFold Pretty sure we meant to be checking signed 32 immediates here rather than unsigned 32 bit. I suspect I messed this up because in MathExtras.h we have isIntN and isUIntN so isIntN differs in signedness depending on whether you're using APInt or plain integers. This fixes a case where we didn't fold a constant created by shrinkAndImmediate. Since shrinkAndImmediate doesn't topologically sort constants it creates, we can fail to convert the Constant to a TargetConstant. This leads to very strange behavior later. Fixes PR48458.	2020-12-09 13:39:07 -08:00
H.J. Lu	18ce612353	Use PC-relative address for x32 TLS address Since x32 supports PC-relative address, it shouldn't use EBX for TLS address. Instead of checking N.getValueType(), we should check Subtarget->is32Bit(). This fixes PR 22676. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D16474	2020-12-02 22:20:36 +00:00
Craig Topper	f385823e04	[X86] Alternate implementation of D88194. This uses PreprocessISelDAG to replace the constant before instruction selection instead of matching opcodes after. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D89178	2020-10-27 00:20:03 -07:00
Wei Wang	d602e79a81	[X86] Encode global address in small code model In small code model, program and its symbols are linked in the lower 2 GB of the address space. Try encoding global address even when the range is unknown in such case. Differential Revision: https://reviews.llvm.org/D89341	2020-10-26 23:14:06 -07:00
Craig Topper	a7e45ea30d	[X86] Add memory operand to AESENC/AESDEC Key Locker instructions. This removes FIXMEs from selectAddr.	2020-10-03 21:42:16 -07:00
Craig Topper	39fc4a0b0a	[X86] Move ENCODEKEY128/256 handling from lowering to selection. We should avoid emitting MachineSDNodes from lowering. We can use the the implicit def handling in InstrEmitter to avoid manually copying from each xmm result register. We only need to manually emit the copies for the implicit uses.	2020-10-03 18:44:53 -07:00
Craig Topper	adccc0bfa3	[X86] Add X86ISD opcodes for the Key Locker AESENCKL and AESDECKL instructions Instead of emitting MachineSDNodes during lowering, emit X86ISD opcodes. These opcodes will either be selected by tablegen patterns or custom selection code. Emitting MachineSDNodes during lowering is uncommon so this makes things more consistent. It also allows selectAddr to be called to perform address matching during instruction selection. I had trouble getting tablegen to accept XMM0-XMM7 as results in an isel pattern for the WIDE instructions so I had to use custom instruction selection.	2020-10-03 16:55:19 -07:00
Craig Topper	46673763fe	[X86] Place new constant node in topological order in X86DAGToDAGISel::matchBitExtract Fixes PR47482	2020-09-14 16:59:04 -07:00
Craig Topper	da1aaa0b70	Revert "[X86] Place new constant node in topological order in X86DAGToDAGISel::matchBitExtract." I got the bug number wrong. This reverts commit `3251593890`.	2020-09-14 16:58:57 -07:00
Craig Topper	3251593890	[X86] Place new constant node in topological order in X86DAGToDAGISel::matchBitExtract. Fixes PR47525	2020-09-14 16:28:37 -07:00
Simon Pilgrim	23d9f4b958	[X86] Fix llvm-qualified-auto warning by using auto*. NFC.	2020-09-03 14:21:17 +01:00
Craig Topper	ab7151f1cf	[X86] Make PreprocessISelDAG create X86ISD::VRNDSCALE nodes with i32 constants instead of i8. This is the type declared in X86InstrFragmentsSIMD.td. ISel pattern matching doesn't check so it doesn't matter in practice. Maybe for SelectionDAG CSE it would matter.	2020-08-17 17:25:51 -07:00
Craig Topper	9201efb3b9	[X86] Custom match X86ISD::VPTERNLOG in X86ISelDAGToDAG in order to reduce isel patterns. By factoring out the end of tryVPTERNLOG, we can use the same code to directly match X86ISD::VPTERNLOG. This allows us to remove around 3-4K worth of X86GenDAGISel.inc.	2020-08-10 23:15:58 -07:00
Wang, Pengfei	9512525947	[X86][FPEnv] Teach X86 mask compare intrinsics to respect strict FP semantics. When we use mask compare intrinsics under strict FP option, the masked elements shouldn't raise any exception. So, we cann't replace the intrinsic with a full compare + "and" operation. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D85385	2020-08-11 10:28:41 +08:00
Craig Topper	966a58e329	[X86] Support matching VPTERNLOG when the root node is X86ISD::ANDNP.	2020-08-08 13:11:47 -07:00
Craig Topper	75f134eec1	[X86] Refactor the broadcast and load folding in tryVPTESTM to reduce some code. Now we try to load and broadcast together for operand 1. Followed by load and broadcast for operand 1. Previously we tried load operand 1, load operand 1, broadcast operand 0, broadcast operand 1. Now we have a single helper that tries load and broadcast for one operand that we can just call twice.	2020-07-31 23:57:13 -07:00
Craig Topper	1bd7046e4c	[X86] Use TargetLowering::getRegClassFor to simplify some code in tryVPTESTM. NFCI	2020-07-31 21:39:10 -07:00
Craig Topper	93c678a79b	[X86] Simplify vpternlog immediate selection. Rather than hardcoding immediate values for 12 different combinations in a nested pair of switches, we can perform the matched logic operation on 3 magic constants to calculate the immediate. Special thanks to this tweet https://twitter.com/rygorous/status/1187034321992871936 for making me realize I could do this.	2020-07-31 17:16:27 -07:00
Craig Topper	c4823b24a4	[X86] Add custom lowering for llvm.roundeven with sse4.1. We can use the roundss/sd/ps/pd instructions like we do for ceil/floor/trunc/rint/nearbyint. Differential Revision: https://reviews.llvm.org/D84592	2020-07-29 10:23:08 -07:00
Craig Topper	df12524e6b	[X86] Turn X86DAGToDAGISel::tryVPTERNLOG into a fully custom instruction selector that can handle bitcasts between logic ops Previously we just matched the logic ops and replaced with an X86ISD::VPTERNLOG node that we would send through the normal pattern match. But that approach couldn't handle a bitcast between the logic ops. Extending that approach would require us to peek through the bitcasts and emit new bitcasts to match the types. Those new bitcasts would then have to be properly topologically sorted. This patch instead switches to directly emitting the MachineSDNode and skips the normal tablegen pattern matching. We do have to handle load folding and broadcast load folding ourselves now. Which also means commuting the immediate control. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D83630	2020-07-26 12:19:08 -07:00
Eric Christopher	e958379581	Fold the opt size check into the assert to silence an unused variable warning.	2020-07-13 16:05:24 -07:00
Hiroshi Yamauchi	fb558ccae7	[PGO][PGSO] Add profile guided size optimization to X86ISelDAGToDAG. Differential Revision: https://reviews.llvm.org/D83331	2020-07-13 10:28:09 -07:00
Xiang1 Zhang	939d8309db	[X86-64] Support Intel AMX Intrinsic INTEL ADVANCED MATRIX EXTENSIONS (AMX). AMX is a new programming paradigm, it has a set of 2-dimensional registers (TILES) representing sub-arrays from a larger 2-dimensional memory image and operate on TILES. These intrinsics use direct TMM register number as its params. Spec can be found in Chapter 3 here https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D83111	2020-07-07 10:13:40 +08:00
Craig Topper	e75f2d5a8c	[X86] Add matching support for X86ISD::ANDNP to X86DAGToDAGISel::tryVPTERNLOG.	2020-07-03 17:50:35 -07:00
Craig Topper	52855ed099	[X86] Add back support for matching VPTERNLOG from back to back logic ops. I think this mostly looks ok. The only weird thing I noticed was a couple rotate vXi8 tests picked up an extra logic op where we have (and (or (and), (andn)), X). Previously we matched the (or (and), (andn)) to vpternlog, but now we match the (and (or), X) and leave the and/andn unmatched.	2020-07-02 22:11:52 -07:00
Craig Topper	1df1186ab1	[X86] Use some preprocessor macros to reduce the very similar repeated code in getVPTESTMOpc. NFCI This function picks X86 opcode name based on type, masking, and whether not a load or broadcast has been folded using multiple switch statements. The contents of the switches mostly just vary in a few characters in the instruction name. So use some macros to build the instruction names to reduce the repetiveness.	2020-06-30 14:38:22 -07:00
Guillaume Chatelet	a976ea3209	[Alignment][NFC] Migrate PPC, X86 and XCore backends to Align This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82779	2020-06-30 08:08:45 +00:00
Craig Topper	d72cb4ce21	Recommit "[X86] Separate imm from relocImm handling." Fix the copy/paste mistake that caused it to fail previously	2020-06-15 10:59:43 -07:00
Hans Wennborg	f47a776628	Revert "[X86] Separate imm from relocImm handling." > relocImm was a complexPattern that handled both ConstantSDNode > and X86Wrapper. But it was only applied selectively because using > it would cause patterns to be not importable into FastISel or > GlobalISel. So it only got applied to flag setting instructions, > stores, RMW arithmetic instructions, and rotates. > > Most of the test changes are a result of making patterns available > to GlobalISel or FastISel. The absolute-cmp.ll change is due to > this fixing a pattern ordering issue to make an absolute symbol > match to an 8-bit immediate before trying a 32-bit immediate. > > I tried to use PatFrags to reduce the repetition, but I was getting > errors from TableGen. This caused "Invalid EmitNode" assertions, see the llvm-commits thread for discussion.	2020-06-15 16:14:59 +02:00
Craig Topper	8885a7640b	[X86] Separate imm from relocImm handling. relocImm was a complexPattern that handled both ConstantSDNode and X86Wrapper. But it was only applied selectively because using it would cause patterns to be not importable into FastISel or GlobalISel. So it only got applied to flag setting instructions, stores, RMW arithmetic instructions, and rotates. Most of the test changes are a result of making patterns available to GlobalISel or FastISel. The absolute-cmp.ll change is due to this fixing a pattern ordering issue to make an absolute symbol match to an 8-bit immediate before trying a 32-bit immediate. I tried to use PatFrags to reduce the repetition, but I was getting errors from TableGen.	2020-06-13 11:29:28 -07:00
Craig Topper	1385ab356a	[X86] Use X86AS enum constants to replace hardcoded numbers in more places. NFC	2020-06-10 22:31:21 -07:00
Craig Topper	324e13668e	[X86] Split imm handling out of selectMOV64Imm32 and add a separate isel pattern. This makes the pattern available to global isel.	2020-06-10 11:12:36 -07:00
Sanjay Patel	6f6d2d2383	[x86] refine conditions for immediate hoisting to save code-size As shown in PR46237: https://bugs.llvm.org/show_bug.cgi?id=46237 The size-savings win for hoisting an 8-bit ALU immediate (intentionally excluding store constants) requires extreme conditions; it may not even be possible when including REX prefix bytes on x86-64. I did draft a version of this patch that included use counts after the loop, but I suspect that accounting is not working as expected. I think that is because the number of constant uses are changing as we select instructions (for example as we transform shl/add into LEA). Differential Revision: https://reviews.llvm.org/D81468	2020-06-09 15:44:55 -04:00
Guozhi Wei	587af86f1d	[X86] Add a flag to guard the wide load As shown in http://lists.llvm.org/pipermail/llvm-dev/2020-May/141854.html, widen load can also cause stall. Add a flag to guard the widening code, so users can disable it and evaluate its performance impact. Differential Revision: https://reviews.llvm.org/D80943	2020-06-02 16:16:13 -07:00
Craig Topper	961c1b5f72	[X86] Remove DeleteNode calls from PreprocessISelDAG. Rely on the RemoveDeadNodes call at the end. Add a MadeChange flag so we don't call RemoveDeadNodes unless something changed.	2020-06-02 14:10:20 -07:00
Craig Topper	07e8a780d8	[X86] Add pseudo instructions to use MULX with a single destination when the low result isn't used. The instruction is defined to only produce high result if both destinations are the same. We can exploit this to avoid unnecessarily clobbering a register. In order to hide this from register allocation we use a pseudo instruction and expand the result during MCInst creation. Differential Revision: https://reviews.llvm.org/D80500	2020-05-30 16:01:01 -07:00

1 2 3 4 5 ...

1010 Commits