llvm-project

Author	SHA1	Message	Date
Konstantina Mitropoulou	d3508ccd15	[AMDGPU] Emit S_CBRANCH_SCC for floating-point conditions. (#120588 ) - [AMDGPU] Add new test. - [AMDGPU] Emit S_CBRANCH_SCC for floating-point conditions. --------- Co-authored-by: Konstantina Mitropoulou <KonstantinaMitropoulou@amd.com>	2024-12-19 11:20:43 -08:00
Brox Chen	4044886c7c	Revert "[AMDGPU][True16][MC] true16 for v_minmax/maxmin_f16 (#119586 )" (#120594 ) This reverts commit e0526b0780f56eede09b05a859a93626ecdc6e4d. The `v_minmax/maxmin_f16`(GFX11) needs to be updated to t16 with `v_minmax/maxmin_num_f16`(GFX12) together since they share the same codegen pattern. Revert the old patch and resubmit	2024-12-19 12:10:23 -05:00
Craig Topper	f139bde8d8	[SelectionDAG] Move SDNode::use_iterator::getOperandNo to SDUse. (#120536 ) This allows us to write more range based for loops because we no longer need the iterator. It also matches IR's Use class.	2024-12-19 09:07:42 -08:00
Craig Topper	e6b2495545	[SelectionDAG] Split SDNode::use_iterator into user_iterator and use_iterator. (#120531 ) SDNode::use_iterator now returns an SDUse& when dereferenced. SDNode::user_iterator returns SDNode*. SDNode::use_begin/use_end/uses work on use_iterator. SDNode::user_begin/user_end/users work on user_iterator. We can now write range based for loops using SDUse& and SDNode::uses(). I've converted many of these in this patch. I didn't update loops that have additional variables updated in their for statement. Some loops use SDNode::use_iterator::getOperandNo() which also prevents using range based for loops. I plan to move this into SDUse in a follow up patch.	2024-12-19 08:35:32 -08:00
Jay Foad	a161e73fcc	[AMDGPU] Remove unnecessary casts to GCNSubtarget	2024-12-19 15:50:53 +00:00
Jay Foad	056e5eccaf	[AMDGPU] Remove unneeded use of !dag. NFC. (#120546 )	2024-12-19 11:01:59 +00:00
Craig Topper	bd261ecc5a	[SelectionDAG] Add SDNode::user_begin() and use it in some places (#120509 ) Most of these are just places that want the first user and aren't iterating over the whole list. While there I changed some use_size() == 1 to hasOneUse() which is more efficient. This is part of an effort to rename use_iterator to user_iterator and provide a use_iterator that dereferences to SDUse&. This patch helps reduce the diff on later patches.	2024-12-18 22:13:04 -08:00
Craig Topper	104ad9258a	[SelectionDAG] Rename SDNode::uses() to users(). (#120499 ) This function is most often used in range based loops or algorithms where the iterator is implicitly dereferenced. The dereference returns an SDNode * of the user rather than SDUse * so users() is a better name. I've long beeen annoyed that we can't write a range based loop over SDUse when we need getOperandNo. I plan to rename use_iterator to user_iterator and add a use_iterator that returns SDUse& on dereference. This will make it more like IR.	2024-12-18 20:09:33 -08:00
Brox Chen	e0526b0780	[AMDGPU][True16][MC] true16 for v_minmax/maxmin_f16 (#119586 ) Support true16 format for v_minmax/maxmin_f16 in MC. Since we are replacing `v_minmax/maxmin_f16` to `v_minmax/maxmin_f16_t16 / v_minmax/maxmin_f16_fake16` in Post-GFX11, have to update the CodeGen pattern for `v_minmax/maxmin_f16` to get CodeGen test passing.	2024-12-18 18:04:50 -05:00
Brox Chen	e10b12e656	[AMDGPU][True16][MC] true16 for v_div_fixup_f16 (#119613 ) Support true16 format for v_div_fixup_f16 in MC.	2024-12-18 18:01:13 -05:00
Brox Chen	dc0ea0f945	[AMDGPU][True16][MC] true16 for v_cvt_pknorm_i16/u16_f16 (#119605 ) Support true16 format for v_cvt_pknorm_i16/u16_f16 in MC.	2024-12-18 17:56:34 -05:00
Jun Wang	d57230c72e	[AMDGPU][MC] Disallow op_sel in some VOP3P dot instructions (#100485 ) In v_dot4 and v_dot8 instructions with 4- or 8-bit packed data (e.g., v_dot4_u32_u8, v_dot8_u32_u4), the op_sel modifier should not be allowed.	2024-12-18 10:50:47 -08:00
Brox Chen	c6f753b9a0	[AMDGPU][True16][MC] true16 for v_pack_b32_f16 (#119630 ) Support true16 format for v_pack_b32_f16 in MC. Since we are replacing v_alignbit_b32 to `v_pack_b32_f16_t16/v_pack_b32_f16_fake16` in Post-GFX11, have to update the CodeGen pattern for `v_pack_b32_f16_fake16 `to get CodeGen test passing. There is no pattern modified/created, but just replacing the `v_pack_b32_f16` with fake16 format. Some of the true16 CodeGen test are impacted since `v_pack_b32_f16` selection are removed in Post-GFX11 while `v_pack_b32_f16_t16` are not yet supported. The CodeGen patch for `v_pack_b32_f16_t16` will be done is the following patch.	2024-12-18 13:28:42 -05:00
Brox Chen	c3241a9a4d	[AMDGPU][True16][MC] test update for v_subrev_f16 in true16 (#119315 ) This is a NFC change. Update mc test for v_subrev_f16 in true16 format. MC source change was done by previous patch and automatically enabled by t16 pesudo	2024-12-18 13:01:08 -05:00
Brox Chen	5270e63cdc	[AMDGPU][True16][MC] test update for v_ldexp_f16 in true16 (#119313 ) This is a NFC change. Update mc test for v_ldexp_f16 in true16 format. MC source change was done by previous patch and automatically enabled by t16 pesudo	2024-12-18 13:00:07 -05:00
Ruiling, Song	67c55b1ffc	[AMDGPU] Make max dwords of memory cluster configurable (#119342 ) We find it helpful to increase the value for graphics workload. Make it configurable so we can experiment with a different value.	2024-12-18 14:17:27 +08:00
Brox Chen	de2acda3df	[AMDGPU][True16][MC] support more VOP3 inst in true16/fake16 format (#113603 ) Support true16 and fake16 format for more VOP3 instructions in MC This patch updates the true16 and fake16 vop_profile for the following instructions and update the asm/dasm tests: v_mad_u16 v_mad_i16 v_med3_f16 v_med3_i16 v_med3_u16 v_max3_f16 v_max3_i16 v_max3_u16 v_min3_f16 v_min3_i16 v_min3_u16 v_med3_num_f16	2024-12-17 13:58:01 -05:00
Brox Chen	b26f534980	[AMDGPU][True16][MC] test update for v_and/or/xor_b16 in true16 (#119489 ) This is a NFC change. Update mc test for v_and/or/xor_b16 in true16 format. MC source change was done by previous patch and automatically enabled by t16 pesudo	2024-12-17 13:26:59 -05:00
Brox Chen	f9a9173b6c	[AMDGPU][True16][MC] test update for v_mul_f16 in true16 (#119314 ) This is a NFC change. Update mc test for v_mul_f16 in true16 format. MC source change was done by previous patch and automatically enabled by t16 pesudo	2024-12-17 13:24:32 -05:00
Brox Chen	8bbbcaddbb	[AMDGPU][True16][MC] test update for v_max_f16/v_min_f16 in true16 (#119291 ) This is a NFC change. Update mc test for v_max/min_f16 in true16 format. MC source change was done by previous patch and automatically enabled by t16 pesudo	2024-12-17 13:12:39 -05:00
Mirko Brkušanin	f7988a338d	[AMDGPU][SIPreEmitPeephole] Fix mustRetainExeczBranch (#120121 ) Do not remove S_CBRANCH_EXECZ if one of the following blocks contains an unconditional branch to a block other than the one immediately following it. This can cause unwanted behavior like infinite loops.	2024-12-17 11:47:38 +01:00
Matt Arsenault	8387cbd0f9	AMDGPU: Delete spills of undef values (#119684 ) AMDGPU: Delete spills of undef values It would be a bit more logical to preserve the undef and do the normal expansion, but this is less work. This avoids verifier errors in a future patch which starts deleting liveness from registers after allocation failures which results in spills of undef values. https://reviews.llvm.org/D122607 Move where undef sgpr spills are deleted	2024-12-17 13:08:38 +07:00
Matt Arsenault	d866005f69	AMDGPU: Do not assert on unhandled types when demangling libcalls (#120068 )	2024-12-16 20:27:06 +07:00
Sergei Barannikov	03847f19f2	[SelectionDAG] Add empty implementation of SelectionDAGInfo to some targets (#119968 ) #119969 adds a couple of new methods to this class, which will need to be overridden by these targets. Part of #119709. Pull Request: https://github.com/llvm/llvm-project/pull/119968	2024-12-16 15:13:46 +03:00
Juan Manuel Martinez Caamaño	ace87ec04c	[AMDGPU][AMDGPURegBankInfo] Map S_BUFFER_LOAD_XXX to its corresponding BUFFER_LOAD_XXX (#117574 ) In one test code generation diverged between GISEL and DAG For example, this intrinsic > %ld = call i8 @llvm.amdgcn.s.buffer.load.u8(<4 x i32> %src, i32 %offset, i32 0) would be lowered into these two cases: * `buffer_load_u8 v2, v2, s[0:3], null offen` * `buffer_load_b32 v2, v2, s[0:3], null offen` This patch fixes this issue.	2024-12-16 10:24:33 +01:00
Matt Arsenault	1100d6a995	AMDGPU: Fix libcall recognition of image array types (#119832 ) Add tests with get_image_width as a sample for all of the non-extension image types. The transform doesn't do anything, but this runs through all the mangled libfunc parsing and shows it does not crash. It would probably be smarter to check for exact match of the types, rather than checking the prefix.	2024-12-16 15:04:53 +09:00
Matt Arsenault	b446c208a5	AMDGPU: Verify function type matches when matching libcalls (#119043 ) Previously this would recognize a call to a mangled ldexp(float, float) as a candidate to replace with the intrinsic. We need to verify the second parameter is in fact an integer. Fixes: SWDEV-501389	2024-12-16 15:01:48 +09:00
Aaditya	0ae75eba67	[AMDGPU] Assert if stack grows downwards. (#119888 )	2024-12-14 17:44:40 +05:30
Kirill Stoimenov	e821f642fd	Revert "[AMDGPU][CodeGen] Do not backtrace invalid -regalloc param (#119687 )" Causes bot failure: https://lab.llvm.org/buildbot/#/builders/55/builds/4246/steps/11/logs/stdio This reverts commit 7a648554f886fbc043c4f3f58ca88f6c4535f2cf.	2024-12-14 03:47:53 +00:00
Matt Arsenault	5f72f2c8fd	AMDGPU: Remove large, negative AddedComplexity from minimum/maximum patterns (#119795 )	2024-12-14 06:17:00 +09:00
Ramkumar Ramachandra	4a0d53a0b0	PatternMatch: migrate to CmpPredicate (#118534 ) With the introduction of CmpPredicate in 51a895a (IR: introduce struct with CmpInst::Predicate and samesign), PatternMatch is one of the first key pieces of infrastructure that must be updated to match a CmpInst respecting samesign information. Implement this change to Cmp-matchers. This is a preparatory step in migrating the codebase over to CmpPredicate. Since we no functional changes are desired at this stage, we have chosen not to migrate CmpPredicate::operator==(CmpPredicate) calls to use CmpPredicate::getMatching(), as that would have visible impact on tests that are not yet written: instead, we call CmpPredicate::operator==(Predicate), preserving the old behavior, while also inserting a few FIXME comments for follow-ups.	2024-12-13 14:18:33 +00:00
Akshat Oke	7a648554f8	[AMDGPU][CodeGen] Do not backtrace invalid -regalloc param (#119687 ) No need to generate a stack trace and a GitHub issue prompt on a wrongly set regalloc option.	2024-12-13 11:58:53 +05:30
paperchalice	1562b70eaf	Reapply "[DomTreeUpdater] Move critical edge splitting code to updater" (#119547 ) This relands commit #115111. Use traditional way to update post dominator tree, i.e. break critical edge splitting into insert, insert, delete sequence. When splitting critical edges, the post dominator tree may change its root node, and `setNewRoot` only works in normal dominator tree... See `6c7e5827ed/llvm/include/llvm/Support/GenericDomTree.h (L684-L687)`	2024-12-13 11:43:09 +08:00
Matt Arsenault	5e53a8dadb	AMDGPU: Fix verifier assert with out of bounds subregister indexes (#119799 ) The manual check for aligned VGPR classes would assert if a virtual register used an index not supported by the register class.	2024-12-13 11:52:11 +09:00
Matt Arsenault	37cd7926b7	AMDGPU: Fix entry for mac in VGPR->AGPR MFMA table (#119693 )	2024-12-13 07:53:05 +09:00
choikwa	463e93b95f	Reapply [AMDGPU] prevent shrinking udiv/urem if either operand exceeds signed max (#119325 ) This reverts commit 254d206ee2a337cb38ba347c896f7c6a14c7f218. +Added a fix in ExpandDivRem24 to disqualify if DivNumBits exceed 24. Original commit & msg: ce6e955ac374f2b86cbbb73b2f32174dffd85f25. Handle signed and unsigned path differently in getDivNumBits. Using computeKnownBits, this rejects shrinking unsigned div/rem if operands exceed signed max since we know NumSignBits will be always 0.	2024-12-12 15:24:34 -05:00
Tim Gymnich	2db2dc8ab9	[GlobalISel][NFC] Fix LLT Propagation (#119587 ) Retain LLT type information by creating new LLTs from the original LLT instead of only using the original scalar size. This PR prepares for the [LLT FPInfo RFC](https://discourse.llvm.org/t/rfc-globalisel-adding-fp-type-information-to-llt/83349/24) where LLTs will carry additional floating point type information in addition to the scalar size.	2024-12-12 09:47:46 -08:00
Pravin Jagtap	bb1961ed77	[AMDGPU] Stop using True16 profile for v_bitop3_b16 of gfx950. (#119706 )	2024-12-12 20:12:08 +05:30
Pravin Jagtap	bdaa82a7bb	[AMDGPU] Mark AGPR tuple implicit in the first instr of AGPR spills. (#115285 ) When AGPRs are spilled to stack through VGPRs, the pei only marks the AGPR tuple as implicit-def. To preserve the liveness, it should also mark the tuple implicit. Fixes: SWDEV-462189	2024-12-12 19:47:17 +05:30
Akshat Oke	0876c11cee	[AMDGPU] Parse wwm filter flag for regalloc fast (#119347 )	2024-12-12 13:51:02 +05:30
Matt Arsenault	ea632e1b34	Reapply "DiagnosticInfo: Clean up usage of DiagnosticInfoInlineAsm" (#119575 ) (#119634 ) This reverts commit 40986feda8b1437ed475b144d5b9a208b008782a. Reapply with fix to prevent temporary Twine from going out of scope.	2024-12-11 16:01:48 -08:00
Shilei Tian	f4037277bb	[AMDGPU][Attributor] Make `AAAMDWavesPerEU` honor existing attribute (#114438 )	2024-12-11 16:50:06 -05:00
Shilei Tian	7dbd6cd294	[AMDGPU][Attributor] Make `AAAMDFlatWorkGroupSize` honor existing attribute (#114357 ) If a function has `amdgpu-flat-work-group-size`, honor it in `initialize` by taking its value directly; otherwise, it uses the default range as a starting point. We will no longer manipulate the known range, which can cause issues because the known range is a "throttle" to the assumed range such that the assumed range can't get widened properly in `updateImpl` if the known range is not set properly for whatever reasons. Another benefit of not touching the known range is, if we indicate pessimistic state, it also invalidates the AA such that `manifest` will not be called. Since we honor the attribute, we don't want and will not add any half-baked attribute added to a function.	2024-12-11 16:47:51 -05:00
Sergei Barannikov	6b2232606d	[TableGen] Replace WantRoot/WantParent SDNode properties with flags (#119599 ) These properties are only valid on ComplexPatterns. Having them as flags is more convenient because one can now use "let = ... in" syntax to set these flags on several patterns at a time. This is also less error-prone as it makes it impossible to specify these properties on records derived from SDPatternOperator. Pull Request: https://github.com/llvm/llvm-project/pull/119599	2024-12-12 00:41:44 +03:00
Vitaly Buka	40986feda8	Revert "DiagnosticInfo: Clean up usage of DiagnosticInfoInlineAsm" (#119575 ) Reverts llvm/llvm-project#119485 Breaks builders, details in llvm/llvm-project#119485	2024-12-11 07:51:36 -08:00
Pravin Jagtap	5e007afa9d	[AMDGPU] Handle hazard in v_scalef32_sr_fp4_* conversions (#118589 ) Presently, compiler selectivelly adds nop when opsel != 0 i.e. only when partially writing to high bytes. Experiments in SWDEV-499733 and SWDEV-501347 suggest that we need nop for above cases irrespective of opsel values. Note: We might need to add few others into the same table.	2024-12-11 18:38:10 +05:30
Jay Foad	8eb12f6775	[AMDGPU] Support s_endpgm_ordered_ps_done on GFX11 (#119230 ) Support assembly/disassembly of this instruction for compatibility with SP3, even though it has no use in GFX11. It is fully removed in GFX12.	2024-12-11 11:48:36 +00:00
Matt Arsenault	884f2ad6f9	DiagnosticInfo: Clean up usage of DiagnosticInfoInlineAsm (#119485 ) Currently LLVMContext::emitError emits any error as an "inline asm" error which does not make any sense. InlineAsm appears to be special, in that it uses a "LocCookie" from srcloc metadata, which looks like a parallel mechanism to ordinary source line locations. This meant that other types of failures had degraded source information reported when available. Introduce some new generic error types, and only use inline asm in the appropriate contexts. The DiagnosticInfo types are still a bit of a mess, and I'm not sure why DiagnosticInfoWithLocationBase exists instead of just having an optional DiagnosticLocation in the base class. DK_Generic is for any error that derives from an IR level instruction, and thus can pull debug locations directly from it. DK_GenericWithLoc is functionally the generic codegen error, since it does not depend on the IR and instead can construct a DiagnosticLocation from the MI debug location.	2024-12-11 17:16:07 +09:00
paperchalice	553058f825	Revert "[DomTreeUpdater] Move critical edge splitting code to updater" (#119512 ) Reverts llvm/llvm-project#115111 Causes #119511	2024-12-11 14:25:17 +08:00
paperchalice	79047fac65	[DomTreeUpdater] Move critical edge splitting code to updater (#115111 ) Support critical edge splitting in dominator tree updater. Continue the work in #100856. Compile time check: https://llvm-compile-time-tracker.com/compare.php?from=87c35d782795b54911b3e3a91a5b738d4d870e55&to=42b3e5623a9ab4c3648564dc0926b36f3b438a3a&stat=instructions%3Au	2024-12-11 11:31:42 +08:00

1 2 3 4 5 ...

9997 Commits