llvm-project

Commit Graph

Author	SHA1	Message	Date
Luo, Yuanke	5cb0979870	[X86][AMX] Split greedy RA for tile register When we fill the shape to tile configure memory, the shape is gotten from AMX pseudo instruction. However the register for the shape may be split or spilled by greedy RA. That cause we fill the shape to config memory after ldtilecfg is executed, so that the shape configuration would be wrong. This patch is to split the tile register allocation from greedy register allocation, so that after tile registers are allocated the shape registers are still virtual register. The shape register only may be redefined or multi-defined by phi elimination pass, two address pass. That doesn't affect tile register configuration. Differential Revision: https://reviews.llvm.org/D128584	2022-06-29 10:35:43 +08:00
Nikita Popov	0eb17a9d86	[X86][AMX] Update tests to use opaque pointers (NFC) There are some codegen differences here, because presence of bitcasts affects AMX codegen in minor ways (the bitcasts are not always in the input IR, but may be added by X86PreAMXConfig for example). Differential Revision: https://reviews.llvm.org/D128424	2022-06-23 14:37:45 +02:00
Nikita Popov	88e64490c1	[X86] Update some AMX tests to use opaque pointers (NFC) This only touches IR tests (or tests without codegen changes).	2022-06-23 12:22:08 +02:00
Nikita Popov	1061511008	[X86PreAMXConfig] Use IRBuilder to insert instructions (NFC) Use an IRBuilder to insert instructions in preWriteTileCfg(). While here, also remove some unnecessary bool return values. There are some test changes because the IRBuilder folds "trunc i16 8 to i8" to "i8 8", and that has knock-on effects on instruction naming. I ran into this when converting tests to opaque pointers and noticed that this pass introduces unnecessary "bitcast ptr to ptr" instructions.	2022-06-22 17:28:48 +02:00
Nikita Popov	ff5301dde9	[X86] Regenerate test checks (NFC) This runs the test through -instnamer and generates test checks using update_test_checks.py. (The previous comment indicated that update_llc_test_checks.py was used, but I rather doubt that.) This relies on the non-determinism fix from `fbb72530fe`, the previous check lines have apparently been written to accomodate that non-determinism.	2022-06-22 17:00:10 +02:00
Nikita Popov	4c921aa3f5	[X86] Name instructions in test (NFC) Run the test through -instnamer, to make it easier to modify.	2022-06-22 16:53:15 +02:00
Nikita Popov	2f448bf509	[X86] Migrate tests to use opaque pointers (NFC) Test updates were performed using: https://gist.github.com/nikic/98357b71fd67756b0f064c9517b62a34 These are only the test updates where the test passed without further modification (which is almost all of them, as the backend is largely pointer-type agnostic).	2022-06-22 14:38:25 +02:00
Luo, Yuanke	aaaf9cede7	[X86][AMX] Replace LDTILECFG with PLDTILECFGV on auto-config. There is intrinsic `@llvm.x86.ldtilecfg` which is lowered to LDTILECFG. This intrinsic is open for user to configure tile registers by themselves. There is a chance that `@llvm.x86.ldtilecfg` would be mixed with the new AMX intrinsics which depend on compiler to configure tile registers. Separate pusedo instruction PLDTILECFGV would avoid unexpected behavious when `@llvm.x86.ldtilecfg` is mixed with new AMX intrinsics. Though user should not mix the two programming model, compiler should avoid crash or UB when they are mixed. Differential Revision: https://reviews.llvm.org/D126519	2022-05-27 16:38:35 +08:00
Luo, Yuanke	535604e177	[X86][AMX] Update test case with automation tool.	2022-05-27 10:35:05 +08:00
Luo, Yuanke	496156ac57	[X86][AMX] Multiple configure for AMX register. The previous solution depends on variable name to record the shape information. However it is not reliable, because in release build compiler would not set the variable name. It can be accomplished with an additional option `fno-discard-value-names`, but it is not acceptable for users. This patch is to preconfigure the tile register with machine instruction. It follow the same way what sigle configure does. In the future we can fall back to multiple configure when single configure fails due to the shape dependency issue. The algorithm to configure the tile register is simple in the patch. We may improve it in the future. It configure tile register based on basic block. Compiler would spill the tile register if it live out the basic block. After the configure there should be no spill across tile confgiure in the register alloction. Just like fast register allocation the algorithm walk the instruction in reverse order. When the shape dependency doesn't meet, it insert ldtilecfg after the last instruction that define the shape. In post configuration compiler also walk the basic block to collect the physical tile register number and generate instruction to fill the stack slot for the correponding shape information. TODO: There is some following work in D125602. The risk is modifying the fast RA may cause regression as fast RA is usded for different targets. We may create an independent RA for tile register. Differential Revision: https://reviews.llvm.org/D125075	2022-05-24 13:18:42 +08:00
Luo, Yuanke	d5d498f9ba	[X86][AMX] Simplify AMX test case. Extract test for zero tile configure into a small test case.	2022-05-08 19:12:54 +08:00
Luo, Yuanke	373ce14760	[X86][AMX] Replace PXOR instruction with SET0 in AMX pre config. To generate zero value, the PXOR instruction need 3 operands that is tied to the same vreg. If is not good in SSA form and with undef value two address instruction pass may convert `%0:vr128 = PXORrr undef %0, undef %0` to `%1:vr128 = PXORrr undef %1:vr128(tied-def 0), undef %0:vr128`. It is not expected. It can be simplified to SET0 instruction which only take 1 destination operand. It should be more friendly to two address instruction pass and register allocation pass. `%0:vr128 = V_SET0` Also add AVX1 code path so that it is consistant to other code. Differential Revision: https://reviews.llvm.org/D124903	2022-05-05 10:44:57 +08:00
Luo, Yuanke	942ec5c36d	[X86][AMX] combine tile cast and load/store instruction. The `llvm.x86.cast.tile.to.vector` intrinsic is lowered to `llvm.x86.tilestored64.internal` and `load <256 x i32>`. The `llvm.x86.cast.vector.to.tile` is lowered to `store <256 x i32>` and `llvm.x86.tileloadd64.internal`. When `llvm.x86.cast.tile.to.vector` is used by `store <256 x i32>` or `load <256 x i32>` is used by `llvm.x86.cast.vector.to.tile`, they can be combined by `llvm.x86.tilestored64.internal` and `llvm.x86.tileloadd64.internal`. Differential Revision: https://reviews.llvm.org/D124378	2022-04-28 14:55:21 +08:00
Luo, Yuanke	f3ad7ea03a	[X86][AMX] Report error when shapes are not pre-defined. Instead of report fatal error, this patch emit error message and exit when shapes are not pre-defined. This would cause the compiling fail but not crash. Differential Revision: https://reviews.llvm.org/D124342	2022-04-26 14:57:25 +08:00
Luo, Yuanke	c712bf3ce4	[X86][AMX] Add test case for D124378.	2022-04-25 20:03:27 +08:00
Luo, Yuanke	690bed0cec	[X86][AMX] Fix infinite loop of getShape. When walk the user chain to get the shape of a phi node. If it is phi node in the chain, we should walk to the user of this phi node instead of the original phi node.	2022-04-10 14:44:51 +08:00
Luo, Yuanke	6753eb0c90	[X86][AMX] Materialize undef or zero value to tilezero The AMX combiner would store undef or zero to stack and invoke tileload to load the data to tile register. To avoid the store/load, we can materialzie undef or zero value to tilezero. Differential Revision: https://reviews.llvm.org/D122714	2022-03-31 19:10:28 +08:00
Luo, Yuanke	7471d8b13c	[X86][AMX] Pre-checkin the test case for AMX undef and zero	2022-03-30 17:53:01 +08:00
Luo, Yuanke	1141c8b6fc	[X86][AMX] Fix bug for amx cast tranform After combining amx cast operation, some amx cast intrinsic may be dead code. This patch is to delete such dead code and avoid crash.	2022-03-30 17:22:30 +08:00
Luo, Yuanke	c4dba47196	[X86][AMX] Don't emit tilerelease for old AMX instrisic. We should avoid mixing old AMX instrinsic with new AMX intrinsic. For old AMX intrinsic, user is responsible for invoking tile release. This patch is to check if there is any tile config generated by compiler. If so it emit tilerelease instruction, otherwise it don't emit the instruction. Differential Revision: https://reviews.llvm.org/D114066	2021-11-18 09:28:32 +08:00
Bing1 Yu	bcec4ccd04	[X86] [AMX] Replace bitcast with specific AMX intrinsics with X86 specific cast. There is some discussion on the bitcast for vector and x86_amx at https://reviews.llvm.org/D99152. This patch is to introduce a x86 specific cast for vector and x86_amx, so that it can avoid some unnecessary optimization by middle-end. On the other way, we have to optimize the x86 specific cast by ourselves. This patch also optimize the cast operation to eliminate redundant code. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D107544	2021-08-17 17:04:26 +08:00
Roman Lebedev	0aef747b84	[NFC][X86][Codegen] Megacommit: mass-regenerate all check lines that were already autogenerated The motivation is that the update script has at least two deviations (`<...>@GOT`/`<...>@PLT`/ and not hiding pointer arithmetics) from what pretty much all the checklines were generated with, and most of the tests are still not updated, so each time one of the non-up-to-date tests is updated to see the effect of the code change, there is a lot of noise. Instead of having to deal with that each time, let's just deal with everything at once. This has been done via: ``` cd llvm-project/llvm/test/CodeGen/X86 grep -rl "; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py" \| xargs -L1 <...>/llvm-project/llvm/utils/update_llc_test_checks.py --llc-binary <...>/llvm-project/build/bin/llc ``` Not all tests were regenerated, however.	2021-06-11 23:57:02 +03:00
Tomas Matheson	773771ba38	[CodeGen][regalloc] Don't align stack slots if the stack can't be realigned Register allocation may spill virtual registers to the stack, which can increase alignment requirements of the stack frame. If the the function did not require stack realignment before register allocation, the registers required to do so may not be reserved/available. This results in a stack frame that requires realignment but can not be realigned. Instead, only increase the alignment of the stack if we are still able to realign. The register SpillAlignment will be ignored if we can't realign, and the backend will be responsible for emitting the correct unaligned loads and stores. This seems to be the assumed behaviour already, e.g. ARMBaseInstrInfo::storeRegToStackSlot and X86InstrInfo::storeRegToStackSlot are both `canRealignStack` aware. Differential Revision: https://reviews.llvm.org/D103602	2021-06-11 16:49:12 +01:00
Bing1 Yu	56d5c46b49	[X86] Support __tile_stream_loadd intrinsic for new AMX interface Adding support for __tile_stream_loadd intrinsic. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D103784	2021-06-11 17:28:43 +08:00
Xiang1 Zhang	5fc9653faa	Remove x86 test amx-fast-tile-config.mir (by its author) This test contains a lot of manual changes which is not convenient to update, and the checks are duplicated with test amx-configO2toO0.ll	2021-06-02 08:29:36 +08:00
Luo, Yuanke	4ed2b6cccd	[X86][AMX] Fix a bug on tile config. The previous code detect if a MBB is bottom block to determine if it is a backedge of a loop. We should check latch block instead of bottom block and we should check the header and the bottom block are in the same loop. Differential Revision: https://reviews.llvm.org/D103145	2021-05-26 21:57:49 +08:00
Xiang1 Zhang	d4bdeca576	[X86] Support AMX fast register allocation Differential Revision: https://reviews.llvm.org/D100026	2021-05-08 14:21:11 +08:00
Xiang1 Zhang	bebafe01a7	Revert "[X86] Support AMX fast register allocation" This reverts commit `77e2e5e07d`.	2021-05-08 13:43:32 +08:00
Xiang1 Zhang	77e2e5e07d	[X86] Support AMX fast register allocation	2021-05-08 13:27:21 +08:00
Benjamin Kramer	df323ba445	Revert "[X86] Support AMX fast register allocation" This reverts commit `3b8ec86fd5`. Revert "[X86] Refine AMX fast register allocation" This reverts commit `c3f95e9197`. This pass breaks using LLVM in a multi-threaded environment by introducing global state.	2021-04-29 18:56:33 +02:00
Wang, Pengfei	016092d786	Reapply "[X86][AMX] Try to hoist AMX shapes' def" We request no intersections between AMX instructions and their shapes' def when we insert ldtilecfg. However, this is not always ture resulting from not only users don't follow AMX API model, but also optimizations. This patch adds a mechanism that tries to hoist AMX shapes' def as well. It only hoists shapes inside a BB, we can improve it for cases across BBs in future. Currently, it only hoists shapes of which all sources' def above the first AMX instruction. We can improve for the case that only source that moves an immediate value to a register below AMX instruction. Reviewed By: xiangzhangllvm Differential Revision: https://reviews.llvm.org/D101067	2021-04-27 10:27:59 +08:00
Xiang1 Zhang	c3f95e9197	[X86] Refine AMX fast register allocation	2021-04-25 14:20:53 +08:00
Xiang1 Zhang	3b8ec86fd5	[X86] Support AMX fast register allocation Differential Revision: https://reviews.llvm.org/D100026	2021-04-25 09:45:41 +08:00
Mitch Phillips	caea37b37e	Revert "[X86][AMX] Try to hoist AMX shapes' def" This reverts commit `90118563ad`. Reason: Broke the MSan buildbots. https://lab.llvm.org/buildbot/#/builders/5/builds/6967/steps/9/logs/stdio More details can be found in the original phabricator review: https://reviews.llvm.org/D101067	2021-04-23 10:42:26 -07:00
Wang, Pengfei	90118563ad	[X86][AMX] Try to hoist AMX shapes' def We request no intersections between AMX instructions and their shapes' def when we insert ldtilecfg. However, this is not always ture resulting from not only users don't follow AMX API model, but also optimizations. This patch adds a mechanism that tries to hoist AMX shapes' def as well. It only hoists shapes inside a BB, we can improve it for cases across BBs in future. Currently, it only hoists shapes of which all sources' def above the first AMX instruction. We can improve for the case that only source that moves an immediate value to a register below AMX instruction. Differential Revision: https://reviews.llvm.org/D101067	2021-04-23 12:17:00 +08:00
Wang, Pengfei	a3b52a9d13	[X86][AMX] Refactor for PostRA ldtilecfg pass. This is a follow up of D99010. We didn't consider the live range of shape registers when hoist ldtilecfg. There maybe risks, e.g. we happen to insert it to an invalid range of some registers and get unexpected error. This patch fixes this problem by storing the value to corresponding stack place of ldtilecfg after all its definition immediately. This patch also fix a problem in previous code: If we don't have a ldtilecfg which dominates all AMX instructions, we cannot initialize shapes for other ldtilecfg. There're still some optimization points left. E.g. eliminate unused mov instructions, break the def-use dependency before RA etc. Reviewed By: LuoYuanke, xiangzhangllvm Differential Revision: https://reviews.llvm.org/D99966	2021-04-14 10:08:23 +08:00
Wang, Pengfei	4cbaaf4a24	[X86][AMX] Hoist ldtilecfg The previous code calculated the first ldtilecfg by dominating all AMX registers' def. This may result in the ldtilecfg being inserted into a loop. This patch try to calculate the nearest point where all shapes of AMX registers are reachable. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D99010	2021-04-12 22:36:41 +08:00
Bing1 Yu	747111ea71	[X86] Pass to transform tdpbsud&tdpbusd&tdpbuud intrinsics to scalar operation Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D99244	2021-04-12 13:58:14 +08:00
Bing1 Yu	0c63b862c4	Revert "[X86] Pass to transform tdpbsud&tdpbusd&tdpbuud intrinsics to scalar operation" This reverts commit `275df61f04`.	2021-03-30 16:33:07 +08:00
Bing1 Yu	275df61f04	[X86] Pass to transform tdpbsud&tdpbusd&tdpbuud intrinsics to scalar operation Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D99244	2021-03-30 16:21:10 +08:00
Bing1 Yu	113f077f80	[X86] Pass to transform tdpbf16ps intrinsics to scalar operation. In previous patch https://reviews.llvm.org/D93594, we only scalarize tilezero, tileload, tilestore and tiledpbssd. In this patch we scalarize tdpbf16ps intrinsic. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D96110	2021-03-22 13:00:40 +08:00
Luo, Yuanke	661c016f68	[X86][AMX] Add test cases for AMX load/store lowering. Differential Revision: https://reviews.llvm.org/D99030	2021-03-22 09:14:52 +08:00
Wang, Pengfei	2327513b85	[X86] Fix a bug when calculating the ldtilecfg insertion points. The BB we initialized the ldtilecfg is special. We don't need to check if its predecessor BBs need to insert ldtilecfg for calls. We reused the flag HasCallBeforeAMX, so that the predecessors won't be added to CfgNeedInsert. This case happens only when the entry BB is in a loop. We need to hoist the first tile config point out of the loop in future. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D98845	2021-03-20 17:48:59 +08:00
Wang, Pengfei	209a626ede	[X86][NFC] Pre-commit test case for the fix of ldtilecfg insertion.	2021-03-18 17:17:03 +08:00
Bing1 Yu	4f198b0c27	[X86] Pass to transform amx intrinsics to scalar operation. This pass runs in any situations but we skip it when it is not O0 and the function doesn't have optnone attribute. With -O0, the def of shape to amx intrinsics is near the amx intrinsics code. We are not able to find a point which post-dominate all the shape and dominate all amx intrinsics. To decouple the dependency of the shape, we transform amx intrinsics to scalar operation, so that compiling doesn't fail. In long term, we should improve fast register allocation to allocate amx register. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D93594	2021-03-16 10:40:22 +08:00
Simon Pilgrim	3fd2fa1220	Revert rG8198d83965ba4b9db6922b44ef3041030b2bac39: "[X86] Pass to transform amx intrinsics to scalar operation." This reverts commit 8198d83965ba4b9db6922b44ef3041030b2bac39.due to buildbot breakages	2021-03-05 11:09:14 +00:00
Luo, Yuanke	8198d83965	[X86] Pass to transform amx intrinsics to scalar operation. This pass runs in any situations but we skip it when it is not O0 and the function doesn't have optnone attribute. With -O0, the def of shape to amx intrinsics is near the amx intrinsics code. We are not able to find a point which post-dominate all the shape and dominate all amx intrinsics. To decouple the dependency of the shape, we transform amx intrinsics to scalar operation, so that compiling doesn't fail. In long term, we should improve fast register allocation to allocate amx register. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D93594	2021-03-05 16:02:02 +08:00
Wang, Pengfei	42e025f9de	[X86] Disable rematerializion for PTILELOADDV Per the discussion in D97453. We currently disable it due to it's not a common scenario and has some problem in implementation. Differential Revision: https://reviews.llvm.org/D97453	2021-02-27 21:08:58 +08:00
Wang, Pengfei	ad9091c5fa	[X86] Allow PTILEZEROV and PTILELOADDV to be rematerializable Spilling and reloading AMX registers are expensive. We allow PTILEZEROV and PTILELOADDV to be rematerializable to avoid the register spilling. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D97453	2021-02-26 21:55:59 +08:00
Liu, Chen3	4bc7c8631a	[X86] Support amx-bf16 intrinsic. Adding support for intrinsics of AMX-BF16. This patch alse fix a bug that AMX-INT8 instructions will be selected with wrong predicate. Differential Revision: https://reviews.llvm.org/D97358	2021-02-25 09:06:48 +08:00

1 2

67 Commits