llvm-project

Author	SHA1	Message	Date
Nikita Popov	6e83c0a1cb	[X86] Convert tests to opaque pointers (NFC)	2024-02-05 12:43:44 +01:00
Evgenii Kudriashov	cfd91199ca	[X86] Skip unused VRegs traverse (#78229 ) Almost all loops with getNumVirtRegs skip unused registers by means of reg_nodbg_empty or empty live interval. Except for these two cases that are revealed by GlobalISel since it can skip RegClass assignment for unused registers. Closes #64452, closes #71926	2024-01-26 23:57:14 +01:00
XinWang10	dd6fec5d4f	[X86][APX]Support lowering for APX promoted AMX-TILE instructions (#78689 ) The enc/dec of promoted AMX-TILE instructions have been supported in https://github.com/llvm/llvm-project/pull/76210. This patch support lowering for promoted AMX-TILE instructions and integrate test to existing tests.	2024-01-22 11:33:23 +08:00
yubingex007-a11y	f2517cbcee	[X86][AMX] remove related code of X86PreAMXConfigPass (#69569 ) In https://reviews.llvm.org/D125075, we switched to use FastPreTileConfig in O0 and abandoned X86PreAMXConfigPass. we can remove related code of X86PreAMXConfigPass safely.	2023-10-20 13:43:34 +08:00
Harald van Dijk	a21abc782a	[X86] Align i128 to 16 bytes in x86 datalayouts This is an attempt at rebooting https://reviews.llvm.org/D28990 I've included AutoUpgrade changes to modify the data layout to satisfy the compatible layout check. But this does mean alloca, loads, stores, etc in old IR will automatically get this new alignment. This should fix PR46320. Reviewed By: echristo, rnk, tmgross Differential Revision: https://reviews.llvm.org/D86310	2023-10-11 10:23:38 +01:00
Jay Foad	7b3bbd83c0	Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038 )" This reverts commit 2501ae58e3bb9a70d279a56d7b3a0ed70a8a852c. Reverted due to various buildbot failures.	2023-10-09 12:31:32 +01:00
Jay Foad	2501ae58e3	[CodeGen] Really renumber slot indexes before register allocation (#67038 ) PR #66334 tried to renumber slot indexes before register allocation, but the numbering was still affected by list entries for instructions which had been erased. Fix this to make the register allocator's live range length heuristics even less dependent on the history of how instructions have been added to and removed from SlotIndexes's maps.	2023-10-09 11:44:41 +01:00
Jay Foad	e0919b189b	[CodeGen] Renumber slot indexes before register allocation (#66334 ) RegAllocGreedy uses SlotIndexes::getApproxInstrDistance to approximate the length of a live range for its heuristics. Renumbering all slot indexes with the default instruction distance ensures that this estimate will be as accurate as possible, and will not depend on the history of how instructions have been added to and removed from SlotIndexes's maps. This also means that enabling -early-live-intervals, which runs the SlotIndexes analysis earlier, will not cause large amounts of churn due to different register allocator decisions.	2023-09-19 11:18:12 +01:00
Jay Foad	102838d3f6	update_mir_test_checks.py: match undef vreg subreg definitions (#66627 ) Following on from D139466 which added support for dead vreg defs, this patch adds support for "undef" defs of subregs. Use this to regenerate checks for amx-greedy-ra-spill-shape.ll which previously required manual tweaks to the autogenerated checks to fix an EXPENSIVE_CHECKS failure; see commit 8b7c1fbd9647a5a6ef246a6b5b2543ea0f5a2337	2023-09-18 12:14:46 +01:00
Nikita Popov	1c6e6432ca	[SCEVExpander] Fix incorrect reuse of more poisonous instructions (PR63763) SCEVExpander tries to reuse existing instruction with the same SCEV expression. However, doing this replacement blindly is not safe, because the instruction might be more poisonous. What we were already doing is to drop poison-generating flags on the reused instruction. But this is not the only way that more poison can be introduced. The poison-generating flag might not be directly on the reused instruction, or the poison contribution might come from something like 0 * %var, which folds to 0 but can still introduce poison. This patch fixes the issue in a principled way, by determining which values can contribute poison to the SCEV expression, and then checking whether any additional values can contribute poison to the instruction being reused. Poison-generating flags are dropped if doing that enables reuse. This is a pretty big hammer and does cause some regressions in tests, but less than I would have expected. I wasn't able to come up with a less intrusive fix that still satisfies the correctness requirements. Fixes https://github.com/llvm/llvm-project/issues/63763. Fixes https://github.com/llvm/llvm-project/issues/63926. Fixes https://github.com/llvm/llvm-project/issues/64333. Fixes https://github.com/llvm/llvm-project/issues/63727. Differential Revision: https://reviews.llvm.org/D158181	2023-08-22 09:27:07 +02:00
Jay Foad	fdbc944385	Fix typos in comments	2023-08-15 13:57:21 +01:00
Bing1 Yu	516e32678d	[X86][AMX] set Stride to Tile's Col when doing combine amxcast and store into tilestore %tile = call x86_amx @llvm.x86.tileloadd64.internal(i16 8, i16 32, i8* %src_ptr, i64 64) %vec = call <256 x i8> @llvm.x86.cast.tile.to.vector.v256i8(x86_amx...%tile) store <256 x i8> %vec, <256 x i8>* %dst_ptr, align 256 => %tile = call x86_amx @llvm.x86.tileloadd64.internal(i16 8, i16 32, i8* %src_ptr, i64 64) %stride = sext i16 32 to i64 call void @llvm.x86.tilestored64.internal(i16 8, i16 32, i8* %dst_ptr, i64 32, x86_amx %tile) Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D153002	2023-06-20 11:55:25 +08:00
Shengchen Kan	c81a121f3f	Revert "Revert "[X86] Remove patterns for ADC/SBB with immediate 8 and optimize during MC lowering, NFCI"" This reverts commit cb16b33a03aff70b2499c3452f2f817f3f92d20d. In fact, the test https://bugs.chromium.org/p/chromium/issues/detail?id=1446973#c2 already passed after 5586bc539acb26cb94e461438de01a5080513401	2023-05-19 22:21:56 +08:00
Hans Wennborg	cb16b33a03	Revert "[X86] Remove patterns for ADC/SBB with immediate 8 and optimize during MC lowering, NFCI" This caused compiler assertions, see comment on https://reviews.llvm.org/D150107. This also reverts the dependent follow-up change: > [X86] Remove patterns for ADD/AND/OR/SUB/XOR/CMP with immediate 8 and optimize during MC lowering, NFCI > > This is follow-up of D150107. > > In addition, the function `X86::optimizeToFixedRegisterOrShortImmediateForm` can be > shared with project bolt and eliminates the code in X86InstrRelaxTables.cpp. > > Differential Revision: https://reviews.llvm.org/D150949 This reverts commit 2ef8ae134828876ab3ebda4a81bb2df7b095d030 and 5586bc539acb26cb94e461438de01a5080513401.	2023-05-19 14:43:33 +02:00
Shengchen Kan	5586bc539a	[X86] Remove patterns for ADD/AND/OR/SUB/XOR/CMP with immediate 8 and optimize during MC lowering, NFCI This is follow-up of D150107. In addition, the function `X86::optimizeToFixedRegisterOrShortImmediateForm` can be shared with project bolt and eliminates the code in X86InstrRelaxTables.cpp. Differential Revision: https://reviews.llvm.org/D150949	2023-05-19 18:22:30 +08:00
Xiang1 Zhang	038b7e6b76	[X86] Support AMX Complex instructions Reviewed By: Wang Pengfei Differential Revision: https://reviews.llvm.org/D147420	2023-04-04 09:54:46 +08:00
Florian Hahn	8b7c1fbd96	[X86] Update check lines that are not properly auto-generated. It looks like some CHECK lines did not use patterns for virtual registers and the register numbering is slightly different with EXPENSIVE_CHECKS. Use patterns manually.	2023-01-13 18:32:50 +00:00
Florian Hahn	20ecc07991	[MachineCombiner] Lift same-bb restriction for reassociable ops. This patch relaxes the restriction that both reassociate operands must be in the same block as the root instruction. The comment indicates that the reason for this restriction was that the operands not in the same block won't have a depth in the trace. I believe this is outdated; if the operand is in a different block, it must dominate the current block (otherwise it would need to be phi), which in turn means the operand's block must be included in the current rance, and depths must be available. There's a test case (no_reassociate_different_block) added in 70520e2f1c5fc4 which shows that we have accurate depths for operands defined in other blocks. This allows reassociation of code that computes the final reduction value after vectorization, among other things. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D141302	2023-01-13 15:32:44 +00:00
Nikita Popov	60442f0d44	[CodeGen] Convert some tests to opaque pointers (NFC) These are mostly MIR tests, which I did not handle during previous conversions.	2023-01-05 13:21:20 +01:00
Jonas Paulsson	5ecd363295	Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions." This reverts commit 122efef8ee9be57055d204d52c38700fe933c033. - Patch fixed to not reuse definitions from predecessors in EH landing pads. - Late review suggestions (by MaskRay) have been addressed. - M68k/pipeline.ll test updated. - Init captures added in processBlock() to avoid capturing structured bindings. - RISCV has this disabled for now. Original commit message: A new pass MachineLateInstrsCleanup is added to be run after PEI. This is a simple pass that removes redundant and identical instructions whenever found by scanning the MF once while keeping track of register definitions in a map. These instructions are typically immediate loads resulting from rematerialization, and address loads emitted by target in eliminateFrameInde(). This is enabled by default, but a target could easily disable it by means of 'disablePass(&MachineLateInstrsCleanupID);'. This late cleanup is naturally not "optimal" in removing instructions as it is done by looking at phys-regs, but still quite effective. It would be desirable to improve other parts of CodeGen and avoid these redundant instructions in the first place, but there are no ideas for this yet. Differential Revision: https://reviews.llvm.org/D123394 Reviewed By: RKSimon, foad, craig.topper, arsenm, asb	2022-12-05 12:53:50 -06:00
Jonas Paulsson	122efef8ee	Revert "Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions."" This reverts commit 17db0de330f943833296ae72e26fa988bba39cb3. Some more bots got broken - need to investigate.	2022-12-05 00:52:00 +01:00
Jonas Paulsson	17db0de330	Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions." Init captures added in processBlock() to avoid capturing structured bindings, which caused the build problems (with clang). RISCV has this disabled for now until problems relating to post RA pseudo expansions are resolved.	2022-12-03 14:15:15 -06:00
Jonas Paulsson	8ef4632681	Revert "[CodeGen] Add new pass for late cleanup of redundant definitions." Temporarily revert and fix buildbot failure. This reverts commit 6d12599fd4134c1da63198c74a25490d28c733f6.	2022-12-01 13:29:24 -05:00
Jonas Paulsson	6d12599fd4	[CodeGen] Add new pass for late cleanup of redundant definitions. A new pass MachineLateInstrsCleanup is added to be run after PEI. This is a simple pass that removes redundant and identical instructions whenever found by scanning the MF once while keeping track of register definitions in a map. These instructions are typically immediate loads resulting from rematerialization, and address loads emitted by target in eliminateFrameInde(). This is enabled by default, but a target could easily disable it by means of 'disablePass(&MachineLateInstrsCleanupID);'. This late cleanup is naturally not "optimal" in removing instructions as it is done by looking at phys-regs, but still quite effective. It would be desirable to improve other parts of CodeGen and avoid these redundant instructions in the first place, but there are no ideas for this yet. Differential Revision: https://reviews.llvm.org/D123394 Reviewed By: RKSimon, foad, craig.topper, arsenm, asb	2022-12-01 13:21:35 -05:00
Xiang1 Zhang	94c5df8a76	[AMX] Support AMX-FP16 new intrinsic interface We support AMX-FP16 isa in https://reviews.llvm.org/D135941 now. The old intrinsic interface need to manually write tile registers. So we support its new intrinsic interface to let it be able to do register allocation. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D138987	2022-12-01 09:47:53 +08:00
Luo, Yuanke	7d59b337f6	[X86][AMX] Fix the shape dependency issue. AMX shape should be defined before AMX intrinsics. However for below case, the shape a.row is defined after tile load of b. If we transform `load b` to `@llvm.x86.tileloadd64 intrinsic`, the shape dependency doesn't meet. ``` void test_tile_dpbsud(__tile1024i a, __tile1024i b, __tile1024i c) { __tile_dpbsud(&c, a, b); } ``` This patch is to store the tile b to stack and reloaded it after the def of b.row. It would cause redundant store/load, but it is simple to avoid generating invalid IR. The better way may hoist `def b.row` before tile load instruction, but it seems more complicated to recursively hoist its operands. Differential Revision: https://reviews.llvm.org/D137923	2022-11-16 10:47:11 +08:00
Matthias Braun	189900eb14	X86: Stop assigning register costs for longer encodings. This stops reporting CostPerUse 1 for `R8`-`R15` and `XMM8`-`XMM31`. This was previously done because instruction encoding require a REX prefix when using them resulting in longer instruction encodings. I found that this regresses the quality of the register allocation as the costs impose an ordering on eviction candidates. I also feel that there is a bit of an impedance mismatch as the actual costs occure when encoding instructions using those registers, but the order of VReg assignments is not primarily ordered by number of Defs+Uses. I did extensive measurements with the llvm-test-suite wiht SPEC2006 + SPEC2017 included, internal services showed similar patterns. Generally there are a log of improvements but also a lot of regression. But on average the allocation quality seems to improve at a small code size regression. Results for measuring static and dynamic instruction counts: Dynamic Counts (scaled by execution frequency) / Optimization Remarks: Spills+FoldedSpills -5.6% Reloads+FoldedReloads -4.2% Copies -0.1% Static / LLVM Statistics: regalloc.NumSpills mean -1.6%, geomean -2.8% regalloc.NumReloads mean -1.7%, geomean -3.1% size..text mean +0.4%, geomean +0.4% Static / LLVM Statistics: mean -2.2%, geomean -3.1%) regalloc.NumSpills mean -2.6%, geomean -3.9%) regalloc.NumReloads mean +0.6%, geomean +0.6%) size..text Static / LLVM Statistics: regalloc.NumSpills mean -3.0% regalloc.NumReloads mean -3.3% size..text mean +0.3%, geomean +0.3% Differential Revision: https://reviews.llvm.org/D133902	2022-09-30 16:01:33 -07:00
Xiang1 Zhang	c836ddaf72	[X86][NFC] Refine load/store reg to StackSlot for extensibility Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D133078	2022-09-07 14:35:42 +08:00
Luo, Yuanke	5cb0979870	[X86][AMX] Split greedy RA for tile register When we fill the shape to tile configure memory, the shape is gotten from AMX pseudo instruction. However the register for the shape may be split or spilled by greedy RA. That cause we fill the shape to config memory after ldtilecfg is executed, so that the shape configuration would be wrong. This patch is to split the tile register allocation from greedy register allocation, so that after tile registers are allocated the shape registers are still virtual register. The shape register only may be redefined or multi-defined by phi elimination pass, two address pass. That doesn't affect tile register configuration. Differential Revision: https://reviews.llvm.org/D128584	2022-06-29 10:35:43 +08:00
Nikita Popov	0eb17a9d86	[X86][AMX] Update tests to use opaque pointers (NFC) There are some codegen differences here, because presence of bitcasts affects AMX codegen in minor ways (the bitcasts are not always in the input IR, but may be added by X86PreAMXConfig for example). Differential Revision: https://reviews.llvm.org/D128424	2022-06-23 14:37:45 +02:00
Nikita Popov	88e64490c1	[X86] Update some AMX tests to use opaque pointers (NFC) This only touches IR tests (or tests without codegen changes).	2022-06-23 12:22:08 +02:00
Nikita Popov	1061511008	[X86PreAMXConfig] Use IRBuilder to insert instructions (NFC) Use an IRBuilder to insert instructions in preWriteTileCfg(). While here, also remove some unnecessary bool return values. There are some test changes because the IRBuilder folds "trunc i16 8 to i8" to "i8 8", and that has knock-on effects on instruction naming. I ran into this when converting tests to opaque pointers and noticed that this pass introduces unnecessary "bitcast ptr to ptr" instructions.	2022-06-22 17:28:48 +02:00
Nikita Popov	ff5301dde9	[X86] Regenerate test checks (NFC) This runs the test through -instnamer and generates test checks using update_test_checks.py. (The previous comment indicated that update_llc_test_checks.py was used, but I rather doubt that.) This relies on the non-determinism fix from fbb72530fe80a95678a7d643d7a3f5ee8d693c93, the previous check lines have apparently been written to accomodate that non-determinism.	2022-06-22 17:00:10 +02:00
Nikita Popov	4c921aa3f5	[X86] Name instructions in test (NFC) Run the test through -instnamer, to make it easier to modify.	2022-06-22 16:53:15 +02:00
Nikita Popov	2f448bf509	[X86] Migrate tests to use opaque pointers (NFC) Test updates were performed using: https://gist.github.com/nikic/98357b71fd67756b0f064c9517b62a34 These are only the test updates where the test passed without further modification (which is almost all of them, as the backend is largely pointer-type agnostic).	2022-06-22 14:38:25 +02:00
Luo, Yuanke	aaaf9cede7	[X86][AMX] Replace LDTILECFG with PLDTILECFGV on auto-config. There is intrinsic `@llvm.x86.ldtilecfg` which is lowered to LDTILECFG. This intrinsic is open for user to configure tile registers by themselves. There is a chance that `@llvm.x86.ldtilecfg` would be mixed with the new AMX intrinsics which depend on compiler to configure tile registers. Separate pusedo instruction PLDTILECFGV would avoid unexpected behavious when `@llvm.x86.ldtilecfg` is mixed with new AMX intrinsics. Though user should not mix the two programming model, compiler should avoid crash or UB when they are mixed. Differential Revision: https://reviews.llvm.org/D126519	2022-05-27 16:38:35 +08:00
Luo, Yuanke	535604e177	[X86][AMX] Update test case with automation tool.	2022-05-27 10:35:05 +08:00
Luo, Yuanke	496156ac57	[X86][AMX] Multiple configure for AMX register. The previous solution depends on variable name to record the shape information. However it is not reliable, because in release build compiler would not set the variable name. It can be accomplished with an additional option `fno-discard-value-names`, but it is not acceptable for users. This patch is to preconfigure the tile register with machine instruction. It follow the same way what sigle configure does. In the future we can fall back to multiple configure when single configure fails due to the shape dependency issue. The algorithm to configure the tile register is simple in the patch. We may improve it in the future. It configure tile register based on basic block. Compiler would spill the tile register if it live out the basic block. After the configure there should be no spill across tile confgiure in the register alloction. Just like fast register allocation the algorithm walk the instruction in reverse order. When the shape dependency doesn't meet, it insert ldtilecfg after the last instruction that define the shape. In post configuration compiler also walk the basic block to collect the physical tile register number and generate instruction to fill the stack slot for the correponding shape information. TODO: There is some following work in D125602. The risk is modifying the fast RA may cause regression as fast RA is usded for different targets. We may create an independent RA for tile register. Differential Revision: https://reviews.llvm.org/D125075	2022-05-24 13:18:42 +08:00
Luo, Yuanke	d5d498f9ba	[X86][AMX] Simplify AMX test case. Extract test for zero tile configure into a small test case.	2022-05-08 19:12:54 +08:00
Luo, Yuanke	373ce14760	[X86][AMX] Replace PXOR instruction with SET0 in AMX pre config. To generate zero value, the PXOR instruction need 3 operands that is tied to the same vreg. If is not good in SSA form and with undef value two address instruction pass may convert `%0:vr128 = PXORrr undef %0, undef %0` to `%1:vr128 = PXORrr undef %1:vr128(tied-def 0), undef %0:vr128`. It is not expected. It can be simplified to SET0 instruction which only take 1 destination operand. It should be more friendly to two address instruction pass and register allocation pass. `%0:vr128 = V_SET0` Also add AVX1 code path so that it is consistant to other code. Differential Revision: https://reviews.llvm.org/D124903	2022-05-05 10:44:57 +08:00
Luo, Yuanke	942ec5c36d	[X86][AMX] combine tile cast and load/store instruction. The `llvm.x86.cast.tile.to.vector` intrinsic is lowered to `llvm.x86.tilestored64.internal` and `load <256 x i32>`. The `llvm.x86.cast.vector.to.tile` is lowered to `store <256 x i32>` and `llvm.x86.tileloadd64.internal`. When `llvm.x86.cast.tile.to.vector` is used by `store <256 x i32>` or `load <256 x i32>` is used by `llvm.x86.cast.vector.to.tile`, they can be combined by `llvm.x86.tilestored64.internal` and `llvm.x86.tileloadd64.internal`. Differential Revision: https://reviews.llvm.org/D124378	2022-04-28 14:55:21 +08:00
Luo, Yuanke	f3ad7ea03a	[X86][AMX] Report error when shapes are not pre-defined. Instead of report fatal error, this patch emit error message and exit when shapes are not pre-defined. This would cause the compiling fail but not crash. Differential Revision: https://reviews.llvm.org/D124342	2022-04-26 14:57:25 +08:00
Luo, Yuanke	c712bf3ce4	[X86][AMX] Add test case for D124378.	2022-04-25 20:03:27 +08:00
Luo, Yuanke	690bed0cec	[X86][AMX] Fix infinite loop of getShape. When walk the user chain to get the shape of a phi node. If it is phi node in the chain, we should walk to the user of this phi node instead of the original phi node.	2022-04-10 14:44:51 +08:00
Luo, Yuanke	6753eb0c90	[X86][AMX] Materialize undef or zero value to tilezero The AMX combiner would store undef or zero to stack and invoke tileload to load the data to tile register. To avoid the store/load, we can materialzie undef or zero value to tilezero. Differential Revision: https://reviews.llvm.org/D122714	2022-03-31 19:10:28 +08:00
Luo, Yuanke	7471d8b13c	[X86][AMX] Pre-checkin the test case for AMX undef and zero	2022-03-30 17:53:01 +08:00
Luo, Yuanke	1141c8b6fc	[X86][AMX] Fix bug for amx cast tranform After combining amx cast operation, some amx cast intrinsic may be dead code. This patch is to delete such dead code and avoid crash.	2022-03-30 17:22:30 +08:00
Luo, Yuanke	c4dba47196	[X86][AMX] Don't emit tilerelease for old AMX instrisic. We should avoid mixing old AMX instrinsic with new AMX intrinsic. For old AMX intrinsic, user is responsible for invoking tile release. This patch is to check if there is any tile config generated by compiler. If so it emit tilerelease instruction, otherwise it don't emit the instruction. Differential Revision: https://reviews.llvm.org/D114066	2021-11-18 09:28:32 +08:00
Bing1 Yu	bcec4ccd04	[X86] [AMX] Replace bitcast with specific AMX intrinsics with X86 specific cast. There is some discussion on the bitcast for vector and x86_amx at https://reviews.llvm.org/D99152. This patch is to introduce a x86 specific cast for vector and x86_amx, so that it can avoid some unnecessary optimization by middle-end. On the other way, we have to optimize the x86 specific cast by ourselves. This patch also optimize the cast operation to eliminate redundant code. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D107544	2021-08-17 17:04:26 +08:00
Roman Lebedev	0aef747b84	[NFC][X86][Codegen] Megacommit: mass-regenerate all check lines that were already autogenerated The motivation is that the update script has at least two deviations (`<...>@GOT`/`<...>@PLT`/ and not hiding pointer arithmetics) from what pretty much all the checklines were generated with, and most of the tests are still not updated, so each time one of the non-up-to-date tests is updated to see the effect of the code change, there is a lot of noise. Instead of having to deal with that each time, let's just deal with everything at once. This has been done via: ``` cd llvm-project/llvm/test/CodeGen/X86 grep -rl "; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py" \| xargs -L1 <...>/llvm-project/llvm/utils/update_llc_test_checks.py --llc-binary <...>/llvm-project/build/bin/llc ``` Not all tests were regenerated, however.	2021-06-11 23:57:02 +03:00

1 2

95 Commits