llvm-project

Author	SHA1	Message	Date
Piotr Sobczak	adf02ae41f	[AMDGPU] Simplify lowerBUILD_VECTOR (#109094 ) Simplify `lowerBUILD_VECTOR` by commoning up the way the vectors are split. Also reorder the checks to avoid a long condition inside `if`.	2024-09-18 12:58:16 +02:00
Aditi Medhane	5a8d2dd1f9	[AMDGPU] Handle subregisters properly in generic operand legalizer (#108496 ) Fix for the issue found during COPY introduction during legalization of PHI operands for sgpr to vgpr copy when subreg is involved.	2024-09-18 13:14:49 +05:30
Thorsten Schütt	acfa294b5e	[GlobalIsel] Canonicalize G_FCMP (#108891 ) As a side-effect, we start constant folding fcmps.	2024-09-17 09:42:04 +02:00
Thorsten Schütt	5c348f692a	[GlobalIsel] Canonicalize G_ICMP (#108755 ) As a side-effect, we start constant folding icmps. Split out from https://github.com/llvm/llvm-project/pull/105991.	2024-09-16 19:25:34 +02:00
Stanislav Mekhanoshin	18f1c980bc	[AMDGPU] Avoid unneeded waitcounts before spill stores (#108303 ) Implicit defs and uses on spill stores were accounted as real defs and uses, while only exist for liveness accounting. As a result unneded waits were generated. Fixes: SWDEV-484177	2024-09-14 02:22:28 -07:00
Stanislav Mekhanoshin	d0e7714de7	[AMDGPU] Error on non-global pointer with s_prefetch_data (#107624 )	2024-09-13 11:14:28 -07:00
Diana Picus	3356208531	Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108512 ) This reverts commit `7792b4ae79`. The problem was a conflict with `e55d6f5ea2` "[AMDGPU] Simplify and improve codegen for llvm.amdgcn.set.inactive (https://github.com/llvm/llvm-project/pull/107889)" which changed the syntax of V_SET_INACTIVE (and thus made my MIR test crash). ...if only we had a merge queue.	2024-09-13 11:54:30 +02:00
Pankaj Dwivedi	991c842b38	[AMDGPU] eliminate frame index v_add wave32 test (#107832 ) PR: #102346 v_add_u32_e64 test cases for wave32	2024-09-13 13:55:14 +05:30
Aditi Medhane	5237f0dbcb	[AMDGPU] Precommit and Modify `phi_moveimm_subreg_input` testcase (#108389 ) - Updated `phi_moveimm_subreg_input` test case to introduce sub-registers as PHI input operands. Currently subreg is making the testcase in non-SSA format, need to fix this by giving subreg as an input operand to PHI instead defining the subreg register. This change is relevant for : [[AMDGPU] Add MachineVerifier check to detect illegal copies from vector register to SGPR ](https://github.com/llvm/llvm-project/pull/105494)	2024-09-12 19:13:00 +05:30
Jay Foad	c657a6f6aa	[AMDGPU] Fix selection of s_load_b96 on GFX11 (#108029 ) Fix a bug which resulted in selection of s_load_b96 on GFX11, which only exists in GFX12. The root cause was a mismatch between legalization and selection. The condition used to check that the load was uniform in legalization (SITargetLowering::LowerLOAD) was "!Op->isDivergent()". The condition used to detect a non-uniform load during selection (AMDGPUDAGToDAGISel::isUniformLoad()) was "N->isDivergent() && !AMDGPUInstrInfo::isUniformMMO(MMO)". This makes a difference when IR uniformity analysis has more information than SDAG's built in analysis. In the test case this is because IR UA reports that everything is uniform if isSingleLaneExecution() returns true, e.g. if the specified max flat workgroup size is 1, but SDAG does not have this optimization. The immediate fix is to use the same condition to detect uniform loads in legalization and selection. In future SDAG should learn about isSingleLaneExecution(), and then it could probably stop relying on IR metadata to detect uniform loads.	2024-09-12 13:41:40 +01:00
Aditi Medhane	36ad0720de	[AMDGPU] Autogenerate checks for phi-vgpr-input-moveimm.mir (#108372 ) Update the MIR checks for phi-vgpr-input-moveimm testcase.	2024-09-12 17:28:24 +05:30
Diana Picus	7792b4ae79	Revert "Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054 )"" (#108341 ) Reverts llvm/llvm-project#108173 si-init-whole-wave.mir crashes on some buildbots (although it passed both locally with sanitizers enabled and in pre-merge tests). Investigating.	2024-09-12 10:12:09 +02:00
Diana Picus	703ebca869	Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054 )" (#108173 ) This reverts commit `c7a7767fca`. The buildbots failed because I removed a MI from its parent before updating LIS. This PR should fix that.	2024-09-12 09:11:41 +02:00
Jay Foad	e55d6f5ea2	[AMDGPU] Simplify and improve codegen for llvm.amdgcn.set.inactive (#107889 ) Always generate v_cndmask_b32 instead of modifying exec around v_mov_b32. This is expected to be faster because modifying exec generally causes pipeline stalls.	2024-09-11 17:16:06 +01:00
Brox Chen	35e27c0ee5	[AMDGPU][True16][MC] 16bit vsrc and vdst support in MC (#104510 ) This is a large patch includes the MC level support for V_CVT_F16_F32, V_CVT_F32_F16 and V_LDEXP_F16 in true16 format. This patch includes the asm/disasm changes to encode/decode the 16bit vsrc, vdst and src modifieres for vop and dpp format. This patch is a dependency for many 16 bit instructions while only three instructions are updated to make it easier to review. There will be another patch to support these three instructions in the codeGen level, this patch just replaces these two instructions with its fake16 format.	2024-09-11 10:48:11 -04:00
Matt Arsenault	ee61a4db3c	AMDGPU: Add tests for minimumnum/maximumnum intrinsics Vector cases are broken, so leave those for later.	2024-09-11 18:20:03 +04:00
Thorsten Schütt	ba4bcce5f5	[GlobalIsel] Combine trunc of binop (#107721 ) trunc (binop X, C) --> binop (trunc X, trunc C) --> binop (trunc X, C`) Try to narrow the width of math or bitwise logic instructions by pulling a truncate ahead of binary operators. Vx and Nx cores consider 32-bit and 64-bit basic arithmetic equal in costs.	2024-09-11 15:04:55 +02:00
Akshat Oke	e1ee07d0ff	[AMDGPU][NewPM] Port SIPeepholeSDWA pass to NPM (#107049 )	2024-09-11 14:30:16 +04:00
Simon Pilgrim	704116373a	[AMDGPU] Regenerate buffer intrinsic tests with update_llc_test_checks. NFC.	2024-09-11 11:06:55 +01:00
Vitaly Buka	c7a7767fca	Revert "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054 ) Breaks bots, see #105822. Reverts llvm/llvm-project#105822	2024-09-10 09:51:43 -07:00
Diana Picus	44556e64f2	[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic (#105822 ) This intrinsic is meant to be used in functions that have a "tail" that needs to be run with all the lanes enabled. The "tail" may contain complex control flow that makes it unsuitable for the use of the existing WWM intrinsics. Instead, we will pretend that the function starts with all the lanes enabled, then branches into the actual body of the function for the lanes that were meant to run it, and then finally all the lanes will rejoin and run the tail. As such, the intrinsic will return the EXEC mask for the body of the function, and is meant to be used only as part of a very limited pattern (for now only in amdgpu_cs_chain functions): ``` entry: %func_exec = call i1 @llvm.amdgcn.init.whole.wave() br i1 %func_exec, label %func, label %tail func: ; ... stuff that should run with the actual EXEC mask br label %tail tail: ; ... stuff that runs with all the lanes enabled; ; can contain more than one basic block ``` It's an error to use the result of this intrinsic for anything other than a branch (but unfortunately checking that in the verifier is non-trivial because SIAnnotateControlFlow will introduce an amdgcn.if between the intrinsic and the branch). The intrinsic is lowered to a SI_INIT_WHOLE_WAVE pseudo, which for now is expanded in si-wqm (which is where SI_INIT_EXEC is handled too); however the information that the function was conceptually started in whole wave mode is stored in the machine function info (hasInitWholeWave). This will be useful in prolog epilog insertion, where we can skip saving the inactive lanes for CSRs (since if the function started with all the lanes active, then there are no inactive lanes to preserve).	2024-09-10 13:24:53 +02:00
sstipanovic	914ab366c2	[AMDGPU] Overload image atomic swap to allow float as well. (#107283 ) LLPC can generate llvm.amdgcn.image.atomic.swap intrinsic with data argument as float type as well as float return type. This went unnoticed until CreateIntrinsic with implicit mangling was used.	2024-09-09 17:54:30 +02:00
Pierre van Houtryve	eaac4a2613	[AMDGPU] Document & Finalize GFX12 Memory Model (#98599 ) Documents the memory model implemented as of #98591, with some fixes/optimizations to the implementation.	2024-09-09 15:35:28 +02:00
Stanislav Mekhanoshin	0745219d4a	[AMDGPU] Add target intrinsic for s_buffer_prefetch_data (#107293 )	2024-09-06 11:41:21 -07:00
Shilei Tian	ce2e38653f	[Attributor] Add support for atomic operations in `AAAddressSpace` (#106927 )	2024-09-06 12:45:16 -04:00
Shilei Tian	109cd11dc4	[Attributor] Skip AS specialization for volatile memory instructions (#107250 )	2024-09-06 11:00:30 -04:00
Matt Arsenault	a9daad8280	AMDGPU: Update live intervals in convertToThreeAddress (#104610 ) Fixes #98741	2024-09-06 18:18:27 +04:00
Chaitanya	50be4f17a0	[AMDGPU] Skip lowerNonKernelLDSAccesses if function is declaration. (#106975 ) This PR skips lowering non-kernel LDS i.e lowerNonKernelLDSAccesses, when function is a declaration or there are no lds globals to process.	2024-09-06 16:04:17 +05:30
Changpeng Fang	24267a7e14	AMDGPU: Add f64 to f32 support for llvm.fptrunc.round (#107481 )	2024-09-05 22:57:27 -07:00
Stanislav Mekhanoshin	bd840a4004	[AMDGPU] Add target intrinsic for s_prefetch_data (#107133 )	2024-09-05 15:14:31 -07:00
Changpeng Fang	e44a67543c	AMDGPU: Add a few unsupported checks for llvm.fptrunc.round intrinsic (#107330 ) A check here can be removed when we implement support for the corresponding types/mode.	2024-09-05 15:05:15 -07:00
Carl Ritson	16cda01d22	[AMDGPU] V_SET_INACTIVE optimizations (#98864 ) Optimize V_SET_INACTIVE by allow it to run in WWM. Hence WWM sections are not broken up for inactive lane setting. WWM V_SET_INACTIVE can typically be lower to V_CNDMASK. Some cases require use of exec manipulation V_MOV as previous code. GFX9 sees slight instruction count increase in edge cases due to smaller constant bus. Additionally avoid introducing exec manipulation and V_MOVs where a source of V_SET_INACTIVE is the destination. This is a common pattern as WWM register pre-allocation often assigns the same register.	2024-09-05 14:39:28 +09:00
Juan Manuel Martinez Caamaño	2d7339ad24	[AMDGPU][LDS] Fix dynamic LDS interaction with "amdgpu-no-lds-kernel-id" (#107092 ) Dynamic lds and Table lds both use the amdgpu_lds_kernel_id intrinsic. Kernels and functons that make an indirect use of this should not have the "amdgpu-no-lds-kernel-id" attribute. For the later, this was done. For the dynamic lds case, this was missing. This patch fixes it.	2024-09-04 16:41:43 +02:00
Christudasan Devadasan	6c143a86cd	[CodeGen][NewPM] Port MachineCSE pass to new pass manager. (#106605 )	2024-09-04 18:54:07 +05:30
Juan Manuel Martinez Caamaño	43b8ae3cea	[AMDGPU][LDS] Pre-Commit tests for 'Fix dynamic LDS interaction with "amdgpu-no-lds-kernel-id" (#107091 )	2024-09-04 13:11:45 +02:00
Jay Foad	5a6926ce49	[AMDGPU] Fix test update after #107108	2024-09-04 11:48:08 +01:00
Jay Foad	126d6f2710	[AMDGPU] Improve codegen for GFX10+ DPP reductions and scans (#107108 ) Use poison for an unused input to the permlanex16 intrinsic, to improve register allocation and avoid an unnecessary v_mov instruction.	2024-09-04 11:03:22 +01:00
paperchalice	69657eb7f6	[llc] Provide `opt` like verifier options (#106665 ) - Support `verify-each` option. - Default behavior is verifying output only.	2024-09-04 17:37:34 +08:00
Carl Ritson	86627149f6	[AMDGPU] Mitigate GFX12 VALU read SGPR hazard (#100067 ) Any SGPR read by a VALU can potentially obscure SALU writes to the same register. Insert s_wait_alu instructions to mitigate the hazard on affected paths. Compute a global cache of SGPRs with any VALU reads and use this to avoid inserting mitigation for SGPRs never accessed by VALUs. To avoid excessive search when compile time is priority implement secondary mode where all SALU writes are mitigated. Co-authored-by: Shilei Tian <shilei.tian@amd.com>	2024-09-04 12:15:20 +09:00
Christudasan Devadasan	042104985c	[AMDGPU][NewPM] Port SIShrinkInstructions to new pass manager. (#106967 )	2024-09-03 10:52:50 +05:30
Shilei Tian	cb949b74e8	[NFC][FIX] Work around update_test_checks bug	2024-09-02 12:33:24 -04:00
Shilei Tian	f32f0289fd	[NFC] Update check lines of the test case `llvm/test/CodeGen/AMDGPU/remove-no-kernel-id-attribute.ll`	2024-09-02 12:23:26 -04:00
Akshat Oke	da13754103	AMDGPU/NewPM Port SILoadStoreOptimizer to NPM (#106362 )	2024-09-02 11:41:56 +05:30
Changpeng Fang	26b0bef192	AMDGPU: Use pattern to select instruction for intrinsic llvm.fptrunc.round (#105761 ) Use GCNPat instead of Custom Lowering to select instructions for intrinsic llvm.fptrunc.round. "SupportedRoundMode : TImmLeaf" is used as a predicate to select only when the rounding mode is supported. "as_hw_round_mode : SDNodeXForm" is developed to translate the round modes to the corresponding ones that hardware recognizes.	2024-08-29 11:43:58 -07:00
Stephen Tozer	3d08ade7bd	[ExtendLifetimes] Implement llvm.fake.use to extend variable lifetimes (#86149 ) This patch is part of a set of patches that add an `-fextend-lifetimes` flag to clang, which extends the lifetimes of local variables and parameters for improved debuggability. In addition to that flag, the patch series adds a pragma to selectively disable `-fextend-lifetimes`, and an `-fextend-this-ptr` flag which functions as `-fextend-lifetimes` for this pointers only. All changes and tests in these patches were written by Wolfgang Pieb (@wolfy1961), while Stephen Tozer (@SLTozer) has handled review and merging. The extend lifetimes flag is intended to eventually be set on by `-Og`, as discussed in the RFC here: https://discourse.llvm.org/t/rfc-redefine-og-o1-and-add-a-new-level-of-og/72850 This patch implements a new intrinsic instruction in LLVM, `llvm.fake.use` in IR and `FAKE_USE` in MIR, that takes a single operand and has no effect other than "using" its operand, to ensure that its operand remains live until after the fake use. This patch does not emit fake uses anywhere; the next patch in this sequence causes them to be emitted from the clang frontend, such that for each variable (or this) a fake.use operand is inserted at the end of that variable's scope, using that variable's value. This patch covers everything post-frontend, which is largely just the basic plumbing for a new intrinsic/instruction, along with a few steps to preserve the fake uses through optimizations (such as moving them ahead of a tail call or translating them through SROA). Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>	2024-08-29 17:53:32 +01:00
Pierre van Houtryve	1f8f2ed66a	[NFC][AMDGPU] Autogenerate tests for uniform i32 promo in ISel (#106382 ) Many tests were easy to update, but these are quite big and I think it's better to autogenerate them to see the difference well.	2024-08-29 15:20:32 +02:00
Matt Arsenault	7b7b0b95b2	DAG: Check if is_fpclass is custom, instead of isLegalOrCustom (#105577 ) For some reason, isOperationLegalOrCustom is not the same as isOperationLegal \|\| isOperationCustom. Unfortunately, it checks if the type is legal which makes it uesless for custom lowering on non-legal types (which is always ppcf128). Really the DAG builder shouldn't be going to expand this in the builder, it makes it difficult to work with. It's only here to work around the DAG requiring legal integer types the same size as the FP type after type legalization.	2024-08-29 14:05:43 +04:00
Akshat Oke	fdca2c33a1	AMDGPU/NewPM Port GCNDPPCombine to NPM (#105816 ) Co-authored-by: Akshat Oke <Akshat.Oke@amd.com>	2024-08-29 14:49:52 +05:30
Akshat Oke	2adc94cd6c	AMDGPU/NewPM: Port SIFoldOperands to new pass manager (#105801 )	2024-08-29 11:34:54 +05:30
Shilei Tian	572d2fd327	[Attributor] Fix an issue that could potentially cause `AccessList` and `OffsetBins` out of sync (#106187 ) The implementation of `AAPointerInfo::RangeList::set_difference` doesn't consider the case where two ranges have the same offset but different sizes. This could cause `AccessList` and `OffsetBins` out of sync because a range has been already updated in `AccessList` but missing in `ToRemove`. I do have a reproducer but the reproducer itself is 248kb. `llvm-reduce` can't further reduce it. Not sure how I can make a smaller reproducer. Fixes: SWDEV-479757.	2024-08-29 01:02:19 -04:00

1 2 3 4 5 ...

7767 Commits