llvm-project

Author	SHA1	Message	Date
JP Lehr	e816c89c84	Revert "InlineSpiller: Consider if all subranges are the same when avoiding redundant spills" This reverts commit d8127b2ba8a87a610851b9a462f2fc2526c36e37.	2023-10-02 06:26:33 -05:00
Matt Arsenault	d8127b2ba8	InlineSpiller: Consider if all subranges are the same when avoiding redundant spills This avoids some redundant spills of subranges, and avoids a compile failure. This greatly reduces the numbers of spills in a loop. The main range is not informative when multiple instructions are needed to fully define a register. A common scenario is a lowered reg_sequence where every subregister is sequentially defined, but each def changes the main range's value number. If we look at specific lanes at the use index, we can see the value is actually the same. In this testcase, there are a large number of materialized 64-bit constant defs which are hoisted outside of the loop by MachineLICM. These are feeding REG_SEQUENCES, which is not considered rematerializable inside the loop. After coalescing, the split constant defs produce main ranges with an apparent phi def. There's no phi def if you look at each individual subrange, and only half of the register is really redefined to a constant. Fixes: SWDEV-380865 https://reviews.llvm.org/D147079	2023-10-01 11:37:53 +03:00
Matt Arsenault	7252787dd9	RegAllocGreedy: Fix detection of lanes read by a bundle SplitKit creates questionably formed bundles of copies when it needs to copy a subset of live lanes and can't do it with a single subregister index. These are merely marked as part of a bundle, and don't start with a BUNDLE instruction. Queries for the slot index would give the first copy in the bundle, and we need to inspect the operands of all the other bundled copies. Also fix and simplify detection of read lane subsets. This causes some RISCV test regressions, but these look like accidentally beneficial splits. I don't see a subrange based reason to perform these splits. Avoids some really ugly regressions in a future patch. https://reviews.llvm.org/D146859	2023-10-01 11:37:48 +03:00
Jay Foad	6e3d2a4b38	[ISel] Fix another crash in new FMA DAG combine (#67818 ) Following on from D135150, this patch fixes another crash caused by this DAG combine: fadd (fma A, B, (fmul C, D)), E --> fma A, B, (fma C, D, E) The combine calls ReplaceAllUsesOfValueWith to replace (fmul C, D) with (fma C, D, E). This can cause nodes to get CSEd. In D135150 the problem was that the (fma C, D, E) node got CSEd away. In this new case, the problem is that the outer fadd node gets CSEd away. To fix it we have to return SDValue(N, 0) from the combine and be careful not to add a deleted node to the worklist.	2023-09-29 17:18:23 +01:00
Nikita Popov	4251aa7a6f	[IRBuilder] Migrate most casts to folding API Migrate creation of most casts to use the FoldXYZ rather than CreateXYZ style APIs. This means that InstSimplifyFolder now works for these, which is what accounts for the AMDGPU test changes.	2023-09-29 12:40:38 +02:00
Mirko Brkušanin	2cd2445c21	[AMDGPU] Src1 of VOP3 DPP instructions can be SGPR on supported subtargets (#67461 ) In order to avoid duplicating every dpp pseudo opcode that has src1, we allow it for all opcodes and add manual checks on subtargets that do not support it.	2023-09-29 11:54:49 +02:00
Yashwant Singh	7ac532efc8	[AMDGPU] Introduce AMDGPU::SGPR_SPILL asm comment flag (#67091 ) Use this flag to give more context to implicit def comments in assembly. Reviewed on phabricator: https://reviews.llvm.org/D153754	2023-09-29 11:15:01 +05:30
Tobias Stadler	305fbc1b32	Revert "[GlobalISel] LegalizationArtifactCombiner: Elide redundant G_AND" This reverts commit 3686a0b611c65f0d7190345b8e3e73cdca9fa657. This seems to have broken some sanitizer tests: https://lab.llvm.org/buildbot/#/builders/184/builds/7721	2023-09-29 03:35:40 +02:00
Tobias Stadler	3686a0b611	[GlobalISel] LegalizationArtifactCombiner: Elide redundant G_AND The legalizer currently generates lots of G_AND artifacts. For example between boolean uses and defs there is always a G_AND with a mask of 1, but when the target uses ZeroOrOneBooleanContents, this is unnecessary. Currently these artifacts have to be removed using post-legalize combines. Omitting these artifacts at their source in the artifact combiner has a few advantages: - We know that the emitted G_AND is very likely to be useless, so our KnownBits call is likely worth it. - The G_AND and G_CONSTANT can interrupt e.g. G_UADDE/... sequences generated during legalization of wide adds which makes it harder to detect these sequences in the instruction selector (e.g. useful to prevent unnecessary reloading of AArch64 NZCV register). - This cleans up a lot of legalizer output and even improves compilation-times. AArch64 CTMark geomean: `O0` -5.6% size..text; `O0` and `O3` ~-0.9% compilation-time (instruction count). Since this introduces KnownBits into code-paths used by `O0`, I reduced the default recursion depth. This doesn't seem to make a difference in CTMark, but should prevent excessive recursive calls in the worst case. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D159140	2023-09-29 02:11:57 +02:00
Jay Foad	c3939eb827	[AMDGPU] Fix typo in scheduler option name (#67661 ) Fix: -amdgpu-disable-unclustred-high-rp-reschedule Now: -amdgpu-disable-unclustered-high-rp-reschedule	2023-09-28 20:54:57 +01:00
Jay Foad	a0a06b1804	[AMDGPU] Make a check slightly more robust Previously this was relying on [[RESULT]] having been defined in an earlier function.	2023-09-28 13:09:51 +01:00
Ivan Kosarev	be8b559956	[AMDGPU] Test codegen'ing True16 additions. The GlobalISel part is to be addressed later. Differential Revision: https://reviews.llvm.org/D156106	2023-09-27 11:10:48 +01:00
Ivan Kosarev	3ff7d51eb8	[AMDGPU][True16] Pre-commit addition tests. Differential Revision: https://reviews.llvm.org/D156529	2023-09-27 10:27:33 +01:00
Jay Foad	e3d714f2cc	[AMDGPU] Add gfx1150 test coverage in trans-forwarding-hazards.mir This demonstrates that gfx1150 does not have FeatureVALUTransUseHazard.	2023-09-26 17:24:43 +01:00
Ivan Kosarev	64482d5766	[AMDGPU] Fix passing CodeGen/AMDGPU/frem.ll on gfx1150. (#67425 ) We would currently crash on it trying to use t16 instructions instead of fake16 ones.	2023-09-26 15:13:23 +01:00
Ivan Kosarev	287f6cdd17	[AMDGPU] Remove the support for non-True16 copies between different register sizes. Differential Revision: https://reviews.llvm.org/D156985	2023-09-26 14:46:34 +01:00
Jingu Kang	ff68e43c81	[MachineLICM] Handle Subloops It is a re-commit from reverted commit 3454cf67bd0a650097dc6ca99874a34e1d59b500. Following discussion on https://reviews.llvm.org/D154205, make MachineLICM pass handle subloops with only visiting outermost loop's blocks once. Differential Revision: https://reviews.llvm.org/D154205	2023-09-26 14:25:11 +01:00
Jay Foad	d85d143ad9	[AMDGPU] New image intrinsic optimizer pass (#67151 ) Implement a new pass to combine multiple image_load_2dmsaa and 2darraymsaa intrinsic calls into a single image_msaa_load if: - they refer to the same vaddr except for sample_id, - they use a constant sample_id and they fall into the same group, - they have the same dmask and the number of instructions and the number of vaddr/vdata dword transfers is reduced by the combine This should be valid on all GFX11 but a hardware bug renders it unworkable on GFX11.0.* so it is only enabled for GFX11.5. Based on a patch by Rodrigo Dominguez!	2023-09-26 09:33:49 +01:00
Ivan Kosarev	053478bbd0	[AMDGPU] Switch to using real True16 operands. The DPP source and e64 destination operands remain unchanged for now. Reviewed By: Joe_Nash Differential Revision: https://reviews.llvm.org/D156104	2023-09-25 18:21:13 +01:00
Austin Kerbow	0455596e1e	[AMDGPU] Add DAG ISel support for preloaded kernel arguments This patch adds the DAG isel changes for kernel argument preloading. These changes are not usable with older firmware but subsequent patches in the series will make the codegen backwards compatible. This patch should only be submitted alongside that subsequent patch. Preloading here begins from the start of the kernel arguments until the amount of arguments indicated by the CL flag amdgpu-kernarg-preload-count. Aggregates and arguments passed by-ref are not supported. Special care for the alignment of the kernarg segment is needed as well as consideration of the alignment of addressable SGPR tuples when we cannot directly use misaligned large tuples that the arguments are loaded to. Reviewed By: bcahoon Differential Revision: https://reviews.llvm.org/D158579	2023-09-25 09:32:59 -07:00
Austin Kerbow	7b70af297a	[AMDGPU] Add IR lowering changes for preloaded kernargs Preloaded kernel arguments should not be lowered in the IR pass AMDGPULowerKernelArguments. Therefore it's necessary to calculate the total number of user SGPRs that are available for preloading and how many SGPRs would be required to preload each argument to determine whether we should skip lowering i.e. the argument will be preloaded instead. Reviewed By: bcahoon Differential Revision: https://reviews.llvm.org/D156853	2023-09-25 08:54:07 -07:00
Diana Picus	327fdcf789	Revert "AMDGPU: Duplicate instead of COPY constants from VGPR to SGPR (#66882 )" This reverts commit a04603993b43e5ebac1531293d288315f1885886 because it broke the OpenMP buildbot.	2023-09-25 13:40:38 +02:00
Diana	a04603993b	AMDGPU: Duplicate instead of COPY constants from VGPR to SGPR (#66882 ) Teach the si-fix-sgpr-copies pass to deal with REG_SEQUENCE, PHI or INSERT_SUBREG where the result is an SGPR, but some of the inputs are constants materialized into VGPRs. This may happen in cases where for instance several instructions use an immediate zero and SelectionDAG chooses to put it in a VGPR to satisfy all of them. This however causes the si-fix-sgpr-copies to try to switch the whole chain to VGPR and may lead to illegal VGPR-to-SGPR copies. Rematerializing the constant into an SGPR fixes the issue.	2023-09-25 13:20:08 +02:00
Simon Pilgrim	8b36d082c4	[DAG] getNode() - fold (zext (trunc x)) -> x iff the upper bits are known zero - add SRL support This is part of the work to address the D155472 regressions, there's a number of issues with generalizing this fold which is why I'm just adding SRL support atm. Differential Revision: https://reviews.llvm.org/D159533	2023-09-24 13:40:07 +01:00
Simon Pilgrim	142efd6d61	[AMDGPU] Add ISD::FSHR Handling to AMDGPUISD::PERM matching Pulled out of D159533, which encourages (zext (trunc x)) -> x folds, leading to more ISD::FSHR nodes, which was breaking some existing AMDGPUISD::PERM tests Differential Revision: https://reviews.llvm.org/D159533	2023-09-24 13:40:07 +01:00
Ivan Kosarev	fab28e0e14	Reapply "[AMDGPU] Introduce real and keep fake True16 instructions." Reverts 6cb3866b1ce9d835402e414049478cea82427cf1. Analysis of failures on buildbots with expensive checks enabled showed that the problem was triggered by changes in another commit, 469b3bfad20550968ac428738eb1f8bb8ce3e96d, and was caused by the bug addressed in #67245.	2023-09-23 22:07:41 +01:00
Ivan Kosarev	6cb3866b1c	Revert "[AMDGPU] Introduce real and keep fake True16 instructions." This reverts commit 0f864c7b8bc9323293ec3d85f4bd5322f8f61b16 due to failures on expensive checks.	2023-09-22 15:40:26 +01:00
Mirko Brkusanin	72e3713009	[IRTranslator] Set NUW flag for inbounds gep and load/store offsets Patch by: Acim Maravic Differential Revision: https://reviews.llvm.org/D159515	2023-09-22 16:16:28 +02:00
Mirko Brkusanin	a657deb42e	[AMDGPU] Update RUN line in test (NFC)	2023-09-22 12:41:54 +02:00
Ivan Kosarev	c62f208c05	[AMDGPU] Don't suppress printing the .l and .h register suffixes. We don't seem to have a use for the -amdgpu-keep-16-bit-reg-suffixes option anymore. Was introduced in <https://reviews.llvm.org/D79435>. Reviewed By: Joe_Nash, foad Differential Revision: https://reviews.llvm.org/D156102	2023-09-22 11:13:05 +01:00
Ivan Kosarev	0f864c7b8b	[AMDGPU] Introduce real and keep fake True16 instructions. The existing fake True16 instructions using 32-bit VGPRs are supposed to co-exist with real ones until all the necessary True16 functionality is implemented and relevant tests are updated. Reviewed By: arsenm, Joe_Nash Differential Revision: https://reviews.llvm.org/D156101	2023-09-22 10:57:56 +01:00
Ivan Kosarev	469b3bfad2	[AMDGPU] Add True16 register classes. Reviewed By: rampitec, Joe_Nash Differential Revision: https://reviews.llvm.org/D156099	2023-09-22 10:17:02 +01:00
Sirish Pande	e6f9483f77	[SelectionDAG] Flags are dropped when creating a new FMUL (#66701 ) While simplifying some vector operators in DAG combine, we may need to create new instructions for simplified vectors. At that time, we need to make sure that all the flags of the new instruction are copied/modified from the old instruction. If "contract" is dropped from an instruction like FMUL, it may not generate FMA instruction which would impact performance. Here's an example where "contract" flag is dropped when FMUL is created. Replacing.2 t42: v2f32 = fmul contract t41, t38 With: t48: v2f32 = fmul t38, t38 Co-authored-by: Sirish Pande <sirish.pande@amd.com>	2023-09-21 10:26:34 -05:00
Jeffrey Byrnes	acb4854563	[AMDGPU] Precommit test for D159533 (#66965 ) Precommit test ahead of https://reviews.llvm.org/D159533 for ISD::FSHR / AMDGPUISD::PERM combine	2023-09-21 12:17:59 +01:00
Mirko Brkušanin	ecfdc23dd2	[AMDGPU] Select gfx1150 SALU Float instructions (#66885 )	2023-09-21 12:22:55 +02:00
Pierre van Houtryve	fe2f67e4ba	[AMDGPU] Remove Code Object V2 (#65715 ) Code Object V2 has been deprecated for more than a year now. We can safely remove it from LLVM. - [clang] Remove support for the `-mcode-object-version=2` option. - [lld] Remove/refactor tests that were still using COV2 - [llvm] Update AMDGPUUsage.rst - Code Object V2 docs are left for informational purposes because those code objects may still be supported by the runtime/loaders for a while. - [AMDGPU] Remove COV2 emission capabilities. - [AMDGPU] Remove `MetadataStreamerYamlV2` which was only used by COV2 - [AMDGPU] Update all tests that were still using COV2 - They are either deleted or ported directly to code object v4 (as v3 is also planned to be removed soon).	2023-09-21 12:00:45 +02:00
Noah Goldstein	47c642f9a0	[DAGCombiner] Fold IEEE `fmul`/`fdiv` by Pow2 to `add`/`sub` of exp Note: This is moving D154678 which previously implemented this in InstCombine. Concerns where brought up that this was de-canonicalizing and really targeting a codegen improvement, so placing in DAGCombiner. This implements: ``` (fmul C, (uitofp Pow2)) -> (bitcast_to_FP (add (bitcast_to_INT C), Log2(Pow2) << mantissa)) (fdiv C, (uitofp Pow2)) -> (bitcast_to_FP (sub (bitcast_to_INT C), Log2(Pow2) << mantissa)) ``` The motivation is mostly fdiv where 2^(-p) is a fairly common expression. The patch is intentionally conservative about the transform, only doing so if we: 1) have IEEE floats 2) C is normal 3) add/sub of max(Log2(Pow2)) stays in the min/max exponent bounds. Alive2 can't realistically prove this, but did test float16/float32 cases (within the bounds of the above rules) exhaustively. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D154805	2023-09-20 13:28:24 -05:00
Noah Goldstein	32a46919a2	[AMDGPU] Add tests for folding `fmul`/`fdiv` by Pow2 to `add`/`sub` of exp; NFC Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D159405	2023-09-20 13:28:24 -05:00
Sirish Pande	cc3491fd45	[SelectionDAG] [NFC] Add pre-commit test for PR66701. (#66796 ) [SelectionDAG] [NFC] Add pre-commit test for PR66701. Co-authored-by: Sirish Pande <sirish.pande@amd.com>	2023-09-20 11:37:18 -05:00
Simon Pilgrim	2ec697b4c7	[AMDGPU] Regenerate always-uniform.ll	2023-09-20 16:58:00 +01:00
Joe Nash	2c0f2b510c	[AMDGPU] Convert tests rotr.ll and rotl.ll to be auto-generated (#66828 ) and add GFX11 coverage. NFC	2023-09-20 10:32:04 -04:00
Jay Foad	a68c7241ec	[AMDGPU] Run twoaddr tests with -early-live-intervals (#66775 ) Sample test case: %3 = V_FMAC_F32_e32 killed %0, %1, %2, implicit $mode, implicit $exec With LiveVariables this is converted to three-address form just because there is no "killed" flag on %2. To make it do the same thing with LiveIntervals I added a later use of %2: %3 = V_FMAC_F32_e32 killed %0, %1, %2, implicit $mode, implicit $exec S_ENDPGM 0, implicit %2	2023-09-20 08:22:00 +01:00
Dhruv Chawla	3e992d81af	[InferAlignment] Enable InferAlignment pass by default This gives an improvement of 0.6%: https://llvm-compile-time-tracker.com/compare.php?from=7d35fe6d08e2b9b786e1c8454cd2391463832167&to=0456c8e8a42be06b62ad4c3e3cf34b21f2633d1e&stat=instructions:u Differential Revision: https://reviews.llvm.org/D158600	2023-09-20 12:08:52 +05:30
Austin Kerbow	60a227c464	[AMDGPU] Use inreg for hint to preload kernel arguments This patch is the first in a series that adds support for pre-loading kernel arguments into SGPRs. The command-line argument 'amdgpu-kernarg-preload-count' is used to specify the number of arguments sequentially from the first that we should attempt to preload, the default is 0. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D156852	2023-09-19 15:13:38 -07:00
Jay Foad	e0919b189b	[CodeGen] Renumber slot indexes before register allocation (#66334 ) RegAllocGreedy uses SlotIndexes::getApproxInstrDistance to approximate the length of a live range for its heuristics. Renumbering all slot indexes with the default instruction distance ensures that this estimate will be as accurate as possible, and will not depend on the history of how instructions have been added to and removed from SlotIndexes's maps. This also means that enabling -early-live-intervals, which runs the SlotIndexes analysis earlier, will not cause large amounts of churn due to different register allocator decisions.	2023-09-19 11:18:12 +01:00
Jay Foad	1d305f95d6	[AMDGPU] Fix line endings in a test	2023-09-19 11:09:03 +01:00
Matt Arsenault	1328a8534b	AMDGPU: Fix handling of -0 in round lowering (#65761 )	2023-09-19 09:14:17 +03:00
Guozhi Wei	cbdccb30c2	[RA] Split a virtual register in cold blocks if it is not assigned preferred physical register If a virtual register is not assigned preferred physical register, it means some COPY instructions will be changed to real register move instructions. In this case we can try to split the virtual register in colder blocks, if success, the original COPY instructions can be deleted, and the new COPY instructions in colder blocks will be generated as register move instructions. It results in fewer dynamic register move instructions executed. The new test case split-reg-with-hint.ll gives an example, the hot path contains 24 instructions without this patch, now it is only 4 instructions with this patch. Differential Revision: https://reviews.llvm.org/D156491	2023-09-15 19:52:50 +00:00
Benjamin Kramer	3454cf67bd	Revert "[MachineLICM] Handle Subloops" This reverts commit 5ec9699c4d1f165364586d825baef434e2c110b4. It accesses MI after it has been hoisted.	2023-09-15 13:20:31 +02:00
Jay Foad	ceb68eea8c	[AMDGPU] Remove repeated -mtriple options from RUN lines (#66486 )	2023-09-15 11:29:24 +01:00

1 2 3 4 5 ...

6824 Commits