llvm-project

Author	SHA1	Message	Date
Luo, Yuanke	6753eb0c90	[X86][AMX] Materialize undef or zero value to tilezero The AMX combiner would store undef or zero to stack and invoke tileload to load the data to tile register. To avoid the store/load, we can materialzie undef or zero value to tilezero. Differential Revision: https://reviews.llvm.org/D122714	2022-03-31 19:10:28 +08:00
Luo, Yuanke	7471d8b13c	[X86][AMX] Pre-checkin the test case for AMX undef and zero	2022-03-30 17:53:01 +08:00
Luo, Yuanke	1141c8b6fc	[X86][AMX] Fix bug for amx cast tranform After combining amx cast operation, some amx cast intrinsic may be dead code. This patch is to delete such dead code and avoid crash.	2022-03-30 17:22:30 +08:00
Luo, Yuanke	c4dba47196	[X86][AMX] Don't emit tilerelease for old AMX instrisic. We should avoid mixing old AMX instrinsic with new AMX intrinsic. For old AMX intrinsic, user is responsible for invoking tile release. This patch is to check if there is any tile config generated by compiler. If so it emit tilerelease instruction, otherwise it don't emit the instruction. Differential Revision: https://reviews.llvm.org/D114066	2021-11-18 09:28:32 +08:00
Bing1 Yu	bcec4ccd04	[X86] [AMX] Replace bitcast with specific AMX intrinsics with X86 specific cast. There is some discussion on the bitcast for vector and x86_amx at https://reviews.llvm.org/D99152. This patch is to introduce a x86 specific cast for vector and x86_amx, so that it can avoid some unnecessary optimization by middle-end. On the other way, we have to optimize the x86 specific cast by ourselves. This patch also optimize the cast operation to eliminate redundant code. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D107544	2021-08-17 17:04:26 +08:00
Roman Lebedev	0aef747b84	[NFC][X86][Codegen] Megacommit: mass-regenerate all check lines that were already autogenerated The motivation is that the update script has at least two deviations (`<...>@GOT`/`<...>@PLT`/ and not hiding pointer arithmetics) from what pretty much all the checklines were generated with, and most of the tests are still not updated, so each time one of the non-up-to-date tests is updated to see the effect of the code change, there is a lot of noise. Instead of having to deal with that each time, let's just deal with everything at once. This has been done via: ``` cd llvm-project/llvm/test/CodeGen/X86 grep -rl "; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py" \| xargs -L1 <...>/llvm-project/llvm/utils/update_llc_test_checks.py --llc-binary <...>/llvm-project/build/bin/llc ``` Not all tests were regenerated, however.	2021-06-11 23:57:02 +03:00
Tomas Matheson	773771ba38	[CodeGen][regalloc] Don't align stack slots if the stack can't be realigned Register allocation may spill virtual registers to the stack, which can increase alignment requirements of the stack frame. If the the function did not require stack realignment before register allocation, the registers required to do so may not be reserved/available. This results in a stack frame that requires realignment but can not be realigned. Instead, only increase the alignment of the stack if we are still able to realign. The register SpillAlignment will be ignored if we can't realign, and the backend will be responsible for emitting the correct unaligned loads and stores. This seems to be the assumed behaviour already, e.g. ARMBaseInstrInfo::storeRegToStackSlot and X86InstrInfo::storeRegToStackSlot are both `canRealignStack` aware. Differential Revision: https://reviews.llvm.org/D103602	2021-06-11 16:49:12 +01:00
Bing1 Yu	56d5c46b49	[X86] Support __tile_stream_loadd intrinsic for new AMX interface Adding support for __tile_stream_loadd intrinsic. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D103784	2021-06-11 17:28:43 +08:00
Xiang1 Zhang	5fc9653faa	Remove x86 test amx-fast-tile-config.mir (by its author) This test contains a lot of manual changes which is not convenient to update, and the checks are duplicated with test amx-configO2toO0.ll	2021-06-02 08:29:36 +08:00
Luo, Yuanke	4ed2b6cccd	[X86][AMX] Fix a bug on tile config. The previous code detect if a MBB is bottom block to determine if it is a backedge of a loop. We should check latch block instead of bottom block and we should check the header and the bottom block are in the same loop. Differential Revision: https://reviews.llvm.org/D103145	2021-05-26 21:57:49 +08:00
Xiang1 Zhang	d4bdeca576	[X86] Support AMX fast register allocation Differential Revision: https://reviews.llvm.org/D100026	2021-05-08 14:21:11 +08:00
Xiang1 Zhang	bebafe01a7	Revert "[X86] Support AMX fast register allocation" This reverts commit 77e2e5e07d01fe0b83c39d0c527c0d3d2e659146.	2021-05-08 13:43:32 +08:00
Xiang1 Zhang	77e2e5e07d	[X86] Support AMX fast register allocation	2021-05-08 13:27:21 +08:00
Benjamin Kramer	df323ba445	Revert "[X86] Support AMX fast register allocation" This reverts commit 3b8ec86fd576b9808dc63da620d9a4f7bbe04372. Revert "[X86] Refine AMX fast register allocation" This reverts commit c3f95e9197643b699b891ca416ce7d72cf89f5fc. This pass breaks using LLVM in a multi-threaded environment by introducing global state.	2021-04-29 18:56:33 +02:00
Wang, Pengfei	016092d786	Reapply "[X86][AMX] Try to hoist AMX shapes' def" We request no intersections between AMX instructions and their shapes' def when we insert ldtilecfg. However, this is not always ture resulting from not only users don't follow AMX API model, but also optimizations. This patch adds a mechanism that tries to hoist AMX shapes' def as well. It only hoists shapes inside a BB, we can improve it for cases across BBs in future. Currently, it only hoists shapes of which all sources' def above the first AMX instruction. We can improve for the case that only source that moves an immediate value to a register below AMX instruction. Reviewed By: xiangzhangllvm Differential Revision: https://reviews.llvm.org/D101067	2021-04-27 10:27:59 +08:00
Xiang1 Zhang	c3f95e9197	[X86] Refine AMX fast register allocation	2021-04-25 14:20:53 +08:00
Xiang1 Zhang	3b8ec86fd5	[X86] Support AMX fast register allocation Differential Revision: https://reviews.llvm.org/D100026	2021-04-25 09:45:41 +08:00
Mitch Phillips	caea37b37e	Revert "[X86][AMX] Try to hoist AMX shapes' def" This reverts commit 90118563ad0f133c696e070ad72761fa0daa4517. Reason: Broke the MSan buildbots. https://lab.llvm.org/buildbot/#/builders/5/builds/6967/steps/9/logs/stdio More details can be found in the original phabricator review: https://reviews.llvm.org/D101067	2021-04-23 10:42:26 -07:00
Wang, Pengfei	90118563ad	[X86][AMX] Try to hoist AMX shapes' def We request no intersections between AMX instructions and their shapes' def when we insert ldtilecfg. However, this is not always ture resulting from not only users don't follow AMX API model, but also optimizations. This patch adds a mechanism that tries to hoist AMX shapes' def as well. It only hoists shapes inside a BB, we can improve it for cases across BBs in future. Currently, it only hoists shapes of which all sources' def above the first AMX instruction. We can improve for the case that only source that moves an immediate value to a register below AMX instruction. Differential Revision: https://reviews.llvm.org/D101067	2021-04-23 12:17:00 +08:00
Wang, Pengfei	a3b52a9d13	[X86][AMX] Refactor for PostRA ldtilecfg pass. This is a follow up of D99010. We didn't consider the live range of shape registers when hoist ldtilecfg. There maybe risks, e.g. we happen to insert it to an invalid range of some registers and get unexpected error. This patch fixes this problem by storing the value to corresponding stack place of ldtilecfg after all its definition immediately. This patch also fix a problem in previous code: If we don't have a ldtilecfg which dominates all AMX instructions, we cannot initialize shapes for other ldtilecfg. There're still some optimization points left. E.g. eliminate unused mov instructions, break the def-use dependency before RA etc. Reviewed By: LuoYuanke, xiangzhangllvm Differential Revision: https://reviews.llvm.org/D99966	2021-04-14 10:08:23 +08:00
Wang, Pengfei	4cbaaf4a24	[X86][AMX] Hoist ldtilecfg The previous code calculated the first ldtilecfg by dominating all AMX registers' def. This may result in the ldtilecfg being inserted into a loop. This patch try to calculate the nearest point where all shapes of AMX registers are reachable. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D99010	2021-04-12 22:36:41 +08:00
Bing1 Yu	747111ea71	[X86] Pass to transform tdpbsud&tdpbusd&tdpbuud intrinsics to scalar operation Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D99244	2021-04-12 13:58:14 +08:00
Bing1 Yu	0c63b862c4	Revert "[X86] Pass to transform tdpbsud&tdpbusd&tdpbuud intrinsics to scalar operation" This reverts commit 275df61f043ccf86a9c17957379bff9434da1489.	2021-03-30 16:33:07 +08:00
Bing1 Yu	275df61f04	[X86] Pass to transform tdpbsud&tdpbusd&tdpbuud intrinsics to scalar operation Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D99244	2021-03-30 16:21:10 +08:00
Bing1 Yu	113f077f80	[X86] Pass to transform tdpbf16ps intrinsics to scalar operation. In previous patch https://reviews.llvm.org/D93594, we only scalarize tilezero, tileload, tilestore and tiledpbssd. In this patch we scalarize tdpbf16ps intrinsic. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D96110	2021-03-22 13:00:40 +08:00
Luo, Yuanke	661c016f68	[X86][AMX] Add test cases for AMX load/store lowering. Differential Revision: https://reviews.llvm.org/D99030	2021-03-22 09:14:52 +08:00
Wang, Pengfei	2327513b85	[X86] Fix a bug when calculating the ldtilecfg insertion points. The BB we initialized the ldtilecfg is special. We don't need to check if its predecessor BBs need to insert ldtilecfg for calls. We reused the flag HasCallBeforeAMX, so that the predecessors won't be added to CfgNeedInsert. This case happens only when the entry BB is in a loop. We need to hoist the first tile config point out of the loop in future. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D98845	2021-03-20 17:48:59 +08:00
Wang, Pengfei	209a626ede	[X86][NFC] Pre-commit test case for the fix of ldtilecfg insertion.	2021-03-18 17:17:03 +08:00
Bing1 Yu	4f198b0c27	[X86] Pass to transform amx intrinsics to scalar operation. This pass runs in any situations but we skip it when it is not O0 and the function doesn't have optnone attribute. With -O0, the def of shape to amx intrinsics is near the amx intrinsics code. We are not able to find a point which post-dominate all the shape and dominate all amx intrinsics. To decouple the dependency of the shape, we transform amx intrinsics to scalar operation, so that compiling doesn't fail. In long term, we should improve fast register allocation to allocate amx register. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D93594	2021-03-16 10:40:22 +08:00
Simon Pilgrim	3fd2fa1220	Revert rG8198d83965ba4b9db6922b44ef3041030b2bac39: "[X86] Pass to transform amx intrinsics to scalar operation." This reverts commit 8198d83965ba4b9db6922b44ef3041030b2bac39.due to buildbot breakages	2021-03-05 11:09:14 +00:00
Luo, Yuanke	8198d83965	[X86] Pass to transform amx intrinsics to scalar operation. This pass runs in any situations but we skip it when it is not O0 and the function doesn't have optnone attribute. With -O0, the def of shape to amx intrinsics is near the amx intrinsics code. We are not able to find a point which post-dominate all the shape and dominate all amx intrinsics. To decouple the dependency of the shape, we transform amx intrinsics to scalar operation, so that compiling doesn't fail. In long term, we should improve fast register allocation to allocate amx register. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D93594	2021-03-05 16:02:02 +08:00
Wang, Pengfei	42e025f9de	[X86] Disable rematerializion for PTILELOADDV Per the discussion in D97453. We currently disable it due to it's not a common scenario and has some problem in implementation. Differential Revision: https://reviews.llvm.org/D97453	2021-02-27 21:08:58 +08:00
Wang, Pengfei	ad9091c5fa	[X86] Allow PTILEZEROV and PTILELOADDV to be rematerializable Spilling and reloading AMX registers are expensive. We allow PTILEZEROV and PTILELOADDV to be rematerializable to avoid the register spilling. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D97453	2021-02-26 21:55:59 +08:00
Liu, Chen3	4bc7c8631a	[X86] Support amx-bf16 intrinsic. Adding support for intrinsics of AMX-BF16. This patch alse fix a bug that AMX-INT8 instructions will be selected with wrong predicate. Differential Revision: https://reviews.llvm.org/D97358	2021-02-25 09:06:48 +08:00
Liu, Chen3	f8b9035aae	[X86] Support amx-int8 intrinsic. Adding support for intrinsics of TDPBSUD/TDPBUSD/TDPBUUD. Differential Revision: https://reviews.llvm.org/D97259	2021-02-23 17:08:05 +08:00
Luo, Yuanke	8f48ddd193	[X86][AMX] Lower tile copy instruction. Since there is no tile copy instruction, we need to store tile register to stack and load from stack to another tile register. We need extra GR to hold the stride, and we need stack slot to hold the tile data register. We would run this pass after copy propagation, so that we don't miss copy optimization. And we would run this pass before prolog/epilog insertion, so that we can allocate stack slot. Differential Revision: https://reviews.llvm.org/D97112	2021-02-23 07:49:42 +08:00
Wang, Pengfei	e9c11c1934	[X86] Zero AMX config buffer for non AVX512 cases. Zero AMX config buffer for non AVX512 cases. Differential Revision: https://reviews.llvm.org/D96927	2021-02-18 13:26:09 +08:00
Wang, Pengfei	9dcfb95ba2	[X86] Add AVX2/SSE2 checks for AMX config buffer zeroing. NFC	2021-02-18 11:30:12 +08:00
Wang, Pengfei	a5d9e0c79b	[X86] Fix tile config register spill issue. This is an optimized approach for D94155. Previous code build the model that tile config register is the user of each AMX instruction. There is a problem for the tile config register spill. When across function, the ldtilecfg instruction may be inserted on each AMX instruction which use tile config register. This cause all tile data register clobber. To fix this issue, we remove the model of tile config register. Instead, we analyze the AMX instructions between one call to another. We will insert ldtilecfg after the first call if we find any AMX instructions. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D95136	2021-01-30 12:53:57 +08:00
Luo, Yuanke	bf64918150	[X86][AMX] Prevent shape def being scheduled across ldtilecfg. Differential Revision: https://reviews.llvm.org/D95582	2021-01-28 16:20:16 +08:00
Luo, Yuanke	64132f541e	Revert "[X86][AMX] Fix tile config register spill issue." This reverts commit 20013d02f3352a88d0838eed349abc9a2b0e9cc0.	2021-01-21 18:11:43 +08:00
Luo, Yuanke	20013d02f3	[X86][AMX] Fix tile config register spill issue. Previous code build the model that tile config register is the user of each AMX instruction. There is a problem for the tile config register spill. When across function, the ldtilecfg instruction may be inserted on each AMX instruction which use tile config register. This cause all tile data register clobber. To fix this issue, we remove the model of tile config register. We analyze the regmask of call instruction and insert ldtilecfg if there is any tile data register live across the call. Inserting the sttilecfg before the call is unneccessary, because the tile config doesn't change and we can just reload the config. Besides we also need check tile config register interference. Since we don't model the config register we should check interference from the ldtilecfg to each tile data register def. ldtilecfg / \ BB1 BB2 / \ call BB3 / \ %1=tileload %2=tilezero We can start from the instruction of each tile def, and backward to ldtilecfg. If there is any call instruction, and tile data register is not preserved, we should insert ldtilecfg after the call instruction. Differential Revision: https://reviews.llvm.org/D94155	2021-01-21 16:01:50 +08:00
Luo, Yuanke	e147eccafa	[X86][AMX] Clear AMX lit test case. Add nounwind attribute to avoid generating cfi instructions. Also make global buffer 64 bytes align in lit test case. Differential Revision: https://reviews.llvm.org/D94910	2021-01-19 11:25:44 +08:00
Luo, Yuanke	c535a7fdad	[X86] Fix tile spill merge issue. This is a additional bug fix for c5be0e0cc0. The distance for the spill instructions is wrong in previous patch. Differential Revision: https://reviews.llvm.org/D94772	2021-01-19 10:51:42 +08:00
Luo, Yuanke	c5be0e0cc0	[X86] Fix tile register spill issue. The tile register spill need 2 instructions. %46:gr64_nosp = MOV64ri 64 TILESTORED %stack.2, 1, killed %46:gr64_nosp, 0, $noreg, %43:tile The first instruction load the stride to a GPR, and the second instruction store tile register to stack slot. The optimization of merge spill instruction is done after register allocation. And spill tile register need create a new virtual register to for stride, so we can't hoist tile spill instruction in postOptimization() of register allocation. We can't hoist TILESTORED alone and we can't hoist the 2 instuctions together because MOV64ri will clobber some GPR. This patch is to disble the spill merge for any spill which need 2 instructions. Differential Revision: https://reviews.llvm.org/D93898	2021-01-11 18:35:09 +08:00
Luo, Yuanke	08665b1805	Support tilezero intrinsic and c interface for AMX. Differential Revision: https://reviews.llvm.org/D92837	2020-12-31 13:24:57 +08:00
Luo, Yuanke	6e9755bb80	[X86] Refactor AMX test case, remove unnecessary code. Differential Revision: https://reviews.llvm.org/D93792	2020-12-30 15:40:20 +08:00
Luo, Yuanke	981a0bd858	[X86] Add x86_amx type for intel AMX. The x86_amx is used for AMX intrisics. <256 x i32> is bitcast to x86_amx when it is used by AMX intrinsics, and x86_amx is bitcast to <256 x i32> when it is used by load/store instruction. So amx intrinsics only operate on type x86_amx. It can help to separate amx intrinsics from llvm IR instructions (+-*/). Thank Craig for the idea. This patch depend on https://reviews.llvm.org/D87981. Differential Revision: https://reviews.llvm.org/D91927	2020-12-30 13:52:13 +08:00
Luo, Yuanke	18925dd872	[X86] Add test case for commit e52bc1d2bba794b. Differential Revision: https://reviews.llvm.org/D93173	2020-12-15 11:14:16 +08:00
Luo, Yuanke	f80b29878b	[X86] AMX programming model. This patch implements amx programming model that discussed in llvm-dev (http://lists.llvm.org/pipermail/llvm-dev/2020-August/144302.html). Thank Hal for the good suggestion in the RA. The fast RA is not in the patch yet. This patch implemeted 7 components. 1. The c interface to end user. 2. The AMX intrinsics in LLVM IR. 3. Transform load/store <256 x i32> to AMX intrinsics or split the type into two <128 x i32>. 4. The Lowering from AMX intrinsics to AMX pseudo instruction. 5. Insert psuedo ldtilecfg and build the def-use between ldtilecfg to amx intruction. 6. The register allocation for tile register. 7. Morph AMX pseudo instruction to AMX real instruction. Change-Id: I935e1080916ffcb72af54c2c83faa8b2e97d5cb0 Differential Revision: https://reviews.llvm.org/D87981	2020-12-10 17:01:54 +08:00

1 2

51 Commits