llvm-project

Author	SHA1	Message	Date
Stanislav Mekhanoshin	2fdfea088c	[AMDGPU] Add v2i32 to the VS_64 types. NFCI. (#88318 ) I am trying to use VOP3Inst with intrinsic taking v2i32 operand and it fails to create patterm without it.	2024-04-10 14:50:54 -07:00
Jun Wang	86842e1f72	[AMDGPU] New clang option for emitting a waitcnt instruction after each memory instruction (#79236 ) This patch introduces a new command-line option for clang, namely, amdgpu-precise-mem-op (or precise-memory in the backend). When this option is specified, a waitcnt instruction is generated after each memory load/store instruction. The counter values are always 0, but which counters are involved depends on the memory instruction. --------- Co-authored-by: Jun Wang <jun.wang7@amd.com>	2024-04-10 10:47:04 -07:00
David Green	4ac2721e51	[AArch64] Add costs for ST3 and ST4 instructions, modelled as store(shuffle). (#87934 ) This tries to add some costs for the shuffle in a ST3/ST4 instruction, which are represented in LLVM IR as store(interleaving shuffle). In order to detect the store, it needs to add a CxtI context instruction to check the users of the shuffle. LD3 and LD4 are added, LD2 should be a zip1 shuffle, which will be added in another patch. It should help fix some of the regressions from #87510.	2024-04-09 16:36:08 +01:00
Jay Foad	9c58f3a234	[AMDGPU] Fix implicit $vcc operands after parsing MIR (#87781 ) MIParser checks that implicit operands match the instruction definition, so they have to be $vcc even in wave32 mode. Use the mirFileLoaded hook to fix them after MIParser's checks, converting them to $vcc_lo which is what that rest of CodeGen expects. This is all just extending the fixImplicitOperands hack which was introduced with GFX10, but at least it makes it possible to write a MIR test which creates the same instructions that normal CodeGen would generate.	2024-04-09 09:10:45 +01:00
Simon Pilgrim	6fa2d03bbf	AMDGPULowerBufferFatPointers.cpp - fix Wunused-variable warning. NFC.	2024-04-04 14:59:01 +01:00
Simon Pilgrim	24c256a6b7	AMDGPULowerBufferFatPointers.cpp - fix Wparentheses warning. NFC.	2024-04-04 14:59:01 +01:00
Jay Foad	3cf539fb04	[AMDGPU] Combine or remove redundant waitcnts at the end of each MBB (#87539 ) Call generateWaitcnt unconditionally at the end of SIInsertWaitcnts::insertWaitcntInBlock. Even if we don't need to generate a new waitcnt instruction it has the effect of combining or removing redundant waitcnts that were already present. Tests show various small improvements in waitcnt placement.	2024-04-04 10:14:16 +01:00
Emma Pilkington	607b4bc602	[AMDGPU] Add a missing COV6 case to getAMDHSACodeObjectVersion() (#87492 )	2024-04-03 15:36:58 -04:00
Joe Nash	e29228efae	[AMDGPU][MC] Allow VOP3C dpp src1 to be imm or SGPR (#87418 ) Allows src1 of VOP3 encoded VOPC to be an SGPR or inline immediate on GFX1150Plus The w32 and w64 _e64_dpp assembler only real instructions were unused, and erroneously constructed in a way that bugged parsing of the new instructions. They are removed. This patch is a follow up to PR https://github.com/llvm/llvm-project/pull/87382	2024-04-03 14:51:27 -04:00
Changpeng Fang	7c68a958e2	AMDGPU: Use PseudoInstr to name SIMCInstr for DSDIR and SOPs, NFC (#87537 ) We should consistently use PseudoInstr instead of Mnemonic to name SIMCInstr, even though they may be the same in most cases	2024-04-03 11:35:09 -07:00
Joe Nash	6a13bbf92f	[AMDGPU][MC] Enables sgpr or imm src1 for float VOP3 DPP, but excludi… (#87382 ) …ng VOPC. Fixes support on GFX1150 and GFX12 where src1 of e64_dpp instructions should allow sgpr and imm operands. PR #67461 added support for this with int operands, but it was missing a piece for float. Changing VOPC e64_dpp will be in a different patch because there is a bug preventing that change.	2024-04-03 11:34:12 -04:00
Jay Foad	7c7ce0b9b1	[AMDGPU] Remove useless aliases for FLAT instructions. NFC. (#87462 ) We were generating "" (the empty string) as an alias for a bunch of FLAT instructions, which had no effect except to cause tablegen to generate some very long if-else chains in the generate AsmMatcher.	2024-04-03 10:17:25 +01:00
Changpeng Fang	12c7371296	AMDGPU: Use PseudoInstr instead of Pseudo Mnemonic for SIMCInstr, NFC (#87420 ) Pseudo Mnemonic could be of other uses.	2024-04-02 15:51:28 -07:00
Sameer Sahasrabuddhe	421557974a	[AMDGPU] Use glue for convergence tokens at call-like operations (#86766 ) The earlier implementation on AMDGPU used explicit token operands at SI_CALL and SI_CALL_ISEL. This is now replaced with CONVERGENCECTRL_GLUE operands, with the following effects: - The treatment of tokens at call-like operations is now consistent with the treatment at intrinsics. - Support for tail calls using implicit tokens at SI_TCRETURN "just works". - The extra parameter at call-like instructions is eliminated, thus restoring those instructions and their handling to the original state. The new glue node is placed after the existing glue node for the outgoing call parameters, which seems to not interfere with selection of the call-like nodes.	2024-04-01 10:51:13 +05:30
Ruiling, Song	216b5e9666	[AMDGPU] Expose RTZ version of f16 interpolation for gfx11+ (#86614 )	2024-04-01 09:48:37 +08:00
Austin Kerbow	b5b34dbb27	[AMDGPU] Use directive for kernarg preload header padding (#86004 )	2024-03-31 11:03:03 -07:00
Austin Kerbow	0234d90d81	[AMDGPU] Extend MFMA padding option to gfx90a+ (#86768 ) It was shown experimentally that this may have some benefit on newer HW.	2024-03-31 10:46:05 -07:00
Jay Foad	95258419f6	[AMDGPU] Use AMDGPU::isIntrinsicAlwaysUniform in isSDNodeAlwaysUniform (#87085 ) This is mostly just a simplification, but tests show a slight codegen improvement in code using the deprecated amdgcn.icmp/fcmp intrinsics.	2024-03-30 08:01:18 +00:00
Shilei Tian	0a43ca731b	[AMDGPU] Fix missing `IsExact` flag when expanding vector binary operator (#86712 )	2024-03-27 17:40:58 -04:00
Kevin P. Neal	f5296df97c	[FPEnv][AMDGPU] Correct AMDGPUSimplifyLibCalls handling of strictfp attribute. (#86705 ) The AMDGPUSimplifyLibCalls pass was lowering function calls with the strictfp attribute to sequences that included function calls incorrectly lacking the attribute. This patch corrects that. The pass now also emits the correct constrained fp call instead of normal FP instructions when in a function with the strictfp attribute. Replacing non-constrained calls with constrained calls when required is still on the IRBuilder's TODO list.	2024-03-27 10:20:00 -04:00
Janek van Oirschot	1103a2a337	Reland [AMDGPU] MCExpr-ify MC layer kernel descriptor (#86494 ) Kernel descriptor attributes, with their respective emit and asm parse functionality, converted to MCExpr. Relands #80855 with fixes	2024-03-27 11:59:56 +00:00
Thomas Symalla	256343a0e9	Revert "Update amdgpu_gfx functions to use s0-s3 for inreg SGPR arguments on targets using scratch instructions for stack #78226 " (#86273 ) Reverts llvm/llvm-project#81394 This reverts commit 3ac243bc0d7922d083af2cf025247b5698556062. It is not handling RSrc registers s0-s3 correctly. This leads to a broken test, where it expects s0-s3 as function argument and uses it as RSrc register as well. We need to re-visit the patch, but apparently we only want to have s0-s3 as argument registers if we don't need them as RSrc registers.	2024-03-26 11:01:08 +01:00
Changpeng Fang	64a7114702	AMDGPU: Simplify SMInstruction definitions, NFC (#86613 ) Copy OtherPredicates from Pseudo to Real. Real should inherit predicates from the corresponding Pseudo	2024-03-25 21:53:19 -07:00
Changpeng Fang	350bda4419	AMDGPU: Rename intrinsics and remove f16/bf16 versions for load transpose (#86313 ) Rename the intrinsics to close to the instruction mnemonic names: Use global_load_tr_b64 and global_load_tr_b128 instead of global_load_tr. This patch also removes f16/bf16 versions of builtins/intrinsics. To simplify the design, we should avoid enumerating all possible types in implementing builtins. We can always use bitcast.	2024-03-25 16:55:22 -07:00
Jeffrey Byrnes	b761137049	[AMDGPU] Use correct VGPR threshold for flagging ExcessRP regions in unified register file case (#85860 ) `ST.getMaxNumVGPRs(MF)` lowers to `AMDGPUBaseInfo.cpp:getTotalNumVGPRs` which returns 512 for gfx90a. This is subsequently limited by `AMDGPUBaseInfo:getAddressableNumVGPRs()`, which also returns 512 for gfx90a. The ISA states we can have a total of 512 registers, but a maximum of only 256 of each of AGPR and VGPR (gfx90a 3.6.4). Therefore, in unified register file case, `ST.getMaxNumVGPRs(MF)` calculates the maximum number of combined VGPR + AGPR. But, it is currently used as the limit for accvgpr and as the limit for archvgpr. This patch uses it as the combined limit, and accounts for the maximum addressable arch/acc VGPRs when calculating the per RegClass limits. It is not unreasonable to think other clients of getTotalNumVGPRs are using it in the wrong way.	2024-03-25 13:11:58 -07:00
David Stuttard	06cfbe3cfd	[AMDPU] Add support for idxen and bothen buffer load/store merging in SILoadStoreOptimizer (#86285 ) Added more buffer instruction merging support	2024-03-25 14:44:22 +00:00
Mariusz Sikora	94a550dab2	[AMDGPU][NFC] Rename Feature GFX11FullVGPRs to 1_5xVGPRs (#86468 )	2024-03-25 11:00:59 +01:00
David Stuttard	75e528fdd9	[AMDGPU] Extend zero initialization of return values for TFE (#85759 ) buffer_load instructions that use TFE also need to zero initialize return values similar to how the image instructions currently work. Add support for this with standard zero init of all results + zero init of just TFE flag when enable-prt-strict-null subtarget feature is disabled.	2024-03-25 09:01:46 +00:00
Pierre van Houtryve	babbdad15b	[AMDGPU] Handle non-register operands for S_SUB/ADD_U64_PSEUDO (#86104 ) This pseudo uses SSrc_b64 so it allows both an immediate or a register, but the lowering crashed on immediate operands.	2024-03-25 09:23:40 +01:00
Sergei Barannikov	5e5b656102	[MC] Make `MCParsedAsmOperand::getReg()` return `MCRegister` (#86444 )	2024-03-25 05:13:48 +03:00
Evgenii Kudriashov	d365a45cb3	[GlobalISel] Introduce G_TRAP, G_DEBUGTRAP, G_UBSANTRAP (#84941 ) Here we introduce three new GMIR instructions to cover a set of trap intrinsics. The idea behind it is that generic intrinsics shouldn't be used with G_INTRINSIC opcode. These new instructions can match perfectly with existing trap ISD nodes. It allows X86, AArch64, RISCV and Mips to reuse SelectionDAG patterns for selection and avoid manual selection. However AMDGPU is an exception. It selects traps during legalization regardless SelectionDAG or GlobalISel. Since there are not many places where traps are used, this change attempts to clean up all the usages of G_INTRINSIC with trap intrinsics. So, there is no stage when both G_TRAP and G_INTRINSIC_W_SIDE_EFFECTS(@llvm.trap) are allowed.	2024-03-23 13:12:44 +01:00
Craig Topper	fb329f1844	[Target] Move SubRegIdxRanges from MCSubtargetInfo to TargetInfo. (#86245 ) I'm planning to add HwMode support to SubRegIdxRanges for RISC-V GPR pairs. The MC layer is currently unaware of the HwMode for registers and I'd like to keep it that way. This information is not used by the MC layer so I think it is safe to move it.	2024-03-22 11:15:45 -07:00
Pravin Jagtap	e1a8120a63	[AMDGPU] Support double type in atomic optimizer. (#84307 ) Presently the atomic optimizer supports only 32-bit operations. Plan is to extend the atomic optimizer for 64-bit operations for compute and graphics. This patch extends support for double type for `uniform values` only. Going forward, will extend the support for divergent values. Adding support for divergent values requires extending/legalizing readfirstlane, readlane, writelane, etc ops for 64-bit operations to avoid `bitcast` noise that we have currently. --------- Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-03-22 09:25:06 +05:30
paperchalice	a2dfc9ac7d	[NewPM][AMDGPU] Add AMDGPUPassRegistry.def (#86095 ) Move the pass registry to a separate file, prepare for porting dag-isel.	2024-03-22 08:49:29 +08:00
Janek van Oirschot	797336b127	Revert "[AMDGPU] MCExpr-ify MC layer kernel descriptor" (#86151 ) Reverts llvm/llvm-project#80855	2024-03-21 10:19:54 -07:00
Matt Arsenault	d8b0d8d671	AMDGPU: Use defset to cleanup marking MFMA intrinsics as divergent (#85915 )	2024-03-21 21:49:57 +05:30
Janek van Oirschot	857161c367	[AMDGPU] MCExpr-ify MC layer kernel descriptor (#80855 ) Kernel descriptor attributes, with their respective emit and asm parse functionality, converted to MCExpr.	2024-03-21 13:57:10 +00:00
SahilPatidar	3ac243bc0d	Update amdgpu_gfx functions to use s0-s3 for inreg SGPR arguments on targets using scratch instructions for stack #78226 (#81394 ) Resolve #78226	2024-03-21 16:52:08 +05:30
Pierre van Houtryve	95a834a16c	(Reland) [AMDGPU] Run LowerLDS at the end of the fullLTO pipeline (#85626 ) Reland of #75333	2024-03-21 11:44:47 +01:00
Pierre van Houtryve	ccb3a8feaa	[AMDGPU][LowerModuleLDS] Refactor partially lowered module detection (#85793 ) Refactor the logic that checks if a module contains mixed absolute/non-lowered LDS GVs. The check now happens latter when the "worklists" are formed. This is because in some cases (OpenMP) we can have non-lowered GVs in a lowered module, and this is normal because those GVs are just unused and removed from the list at some point before the end of `getUsesOfLDSByFunction`. Doing the check later ensures that if a mixed module is spotted, then it's a _real_ mixed module that needs rejection, not a module containing an intentionally ignored GV.	2024-03-21 11:28:35 +01:00
Matt Arsenault	b6b703b2df	AMDGPU: Infer no-agpr usage in AMDGPUAttributor (#85948 ) SIMachineFunctionInfo has a scan of the function body for inline asm which may use AGPRs, or callees in SIMachineFunctionInfo. Move this into the attributor, so it actually works interprocedurally. Could probably avoid most of the test churn if this bothered to avoid adding this on subtargets without AGPRs. We should also probably try to delete the MIR scan in usesAGPRs but it seems to be trickier to eliminate.	2024-03-21 14:24:06 +05:30
Stanislav Mekhanoshin	c2fd0e4398	[AMDGPU] Copy SOP properties from pseudo to real. NFCI. (#85997 ) This is to help llvm-obdump to analyze instructions in a future patch.	2024-03-20 13:29:29 -07:00
Jay Foad	abed4b7476	[AMDGPU] Simplify definition of GLOBAL_LOAD_TR Real instructions	2024-03-20 13:54:13 +00:00
Jay Foad	f24d68a107	[AMDGPU] Remove FLAT_Real_AllAddr_gfx11 in favor of GLOBAL_Real_AllAddr_gfx11 Plus some related cleanups. NFC.	2024-03-20 11:23:08 +00:00
Jay Foad	56e3249152	[AMDGPU] Simplify GFX11/GFX12 FLAT instruction definitions. NFC. (#85819 ) - Give the tablegen record for the Real the same name as the tablegen record for the pseudo. This removes all cases where the same instruction name has to be mentioned more than once on the definition line. - Use multiclasses for all Real definitions, to allow suffixes to be added bit by bit, e.g. first _SADDR and then _gfx11. This is a similar approach to the one used in BUFInstructions.td.	2024-03-20 10:04:22 +00:00
Nikita Popov	0f46e31cfb	[IR] Change representation of getelementptr inrange (#84341 ) As part of the migration to ptradd (https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699), we need to change the representation of the `inrange` attribute, which is used for vtable splitting. Currently, inrange is specified as follows: ``` getelementptr inbounds ({ [4 x ptr], [4 x ptr] }, ptr @vt, i64 0, inrange i32 1, i64 2) ``` The `inrange` is placed on a GEP index, and all accesses must be "in range" of that index. The new representation is as follows: ``` getelementptr inbounds inrange(-16, 16) ({ [4 x ptr], [4 x ptr] }, ptr @vt, i64 0, i32 1, i64 2) ``` This specifies which offsets are "in range" of the GEP result. The new representation will continue working when canonicalizing to ptradd representation: ``` getelementptr inbounds inrange(-16, 16) (i8, ptr @vt, i64 48) ``` The inrange offsets are relative to the return value of the GEP. An alternative design could make them relative to the source pointer instead. The result-relative format was chosen on the off-chance that we want to extend support to non-constant GEPs in the future, in which case this variant is more expressive. This implementation "upgrades" the old inrange representation in bitcode by simply dropping it. This is a very niche feature, and I don't think trying to upgrade it is worthwhile. Let me know if you disagree.	2024-03-20 10:59:45 +01:00
Peter Rong	4a026b5092	[AMDGCN] Use ZExt when handling indices in insertment element (#85718 ) When i1 true is used as an index, SExt extends it to i32 -1. This would cause BitVector to overflow. The language manual have specified that the index shall be treated as an unsigned number, this patch fixes that. (https://llvm.org/docs/LangRef.html#insertelement-instruction) This patch fixes #85717 --------- Signed-off-by: Peter Rong <PeterRong96@gmail.com>	2024-03-19 21:44:08 -07:00
Changpeng Fang	ab76052fa9	AMDGPU: Treat SWMMAC the same as MFMA and other WMMA for sched_barrier (#85721 )	2024-03-19 09:58:09 -07:00
Jeremy Morse	b9d83eff25	[NFC][RemoveDIs] Use iterators for insertion at various call-sites (#84736 ) These are the last remaining "trivial" changes to passes that use Instruction pointers for insertion. All of this should be NFC, it's just changing the spelling of how we identify a position. In one or two locations, I'm also switching uses of getNextNode etc to using std::next with iterators. This too should be NFC. --------- Merged by: Stephen Tozer <stephen.tozer@sony.com>	2024-03-19 16:36:29 +00:00
Pierre van Houtryve	953c13b5c9	[AMDGPU][PromoteAlloca] Whole-function alloca promotion to vector (#84735 ) Update PromoteAllocaToVector so it considers the whole function before promoting allocas. Allocas are scored & sorted so the highest value ones are seen first. The budget is now per function instead of per alloca. Passed internal performance testing.	2024-03-19 11:49:22 +01:00

1 2 3 4 5 ...

8995 Commits