llvm-project

Author	SHA1	Message	Date
David Stuttard	ce031fc17a	[AMDGPU] Fix non-deterministic iteration order in SIFixSGPRCopies (#66617 ) Use of DenseSet was causing some non-deteminism in SIFixSGPRSopies. Changing to SetVector fixes the problem.	2023-09-18 10:08:53 +01:00
Matt Arsenault	1f15e39d81	AMDGPU/GlobalISel: Don't pointlessly check for convergent intrinsics The set of handled intrinsics for fneg combines aren't convergent. The only case we might want to handle is mov_dpp.	2023-09-15 23:32:19 +03:00
Jay Foad	fcbdcb13ce	[AMDGPU] Tweak tuple weight calculation. NFC. (#66490 ) This just makes it more obvious that GCNRegPressure does not actually use pressure sets.	2023-09-15 16:30:06 +01:00
Pierre van Houtryve	e9e3868707	[AMDGPU] Correctly restore FP mode in FDIV32 lowering (#66346 ) Addresses the FIXME for both DAGISel and GISel.	2023-09-15 08:11:01 +02:00
Arthur Eubanks	0a1aa6cda2	[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295 ) This will make it easy for callers to see issues with and fix up calls to createTargetMachine after a future change to the params of TargetMachine. This matches other nearby enums. For downstream users, this should be a fairly straightforward replacement, e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive or s/CGFT_/CodeGenFileType::	2023-09-14 14:10:14 -07:00
Pierre van Houtryve	3d0353793b	[AMDGPU] Fix `HasFP32Denormals` check in FDIV32 lowering (#66212 ) Fixes SWDEV-403219	2023-09-14 08:47:10 +02:00
Simon Pilgrim	47a9cd0343	[AMDGPU] Remove constexpr from getNumUserSGPRForField/getMaxNumPreloadedSGPRs to appease older gcc builds Older versions of gcc wouldn't accept the constexpr getNumUserSGPRForField (introduced in D159439 / 343be5132e2831d85) as it couldn't treat the llvm_unreachable call as constexpr	2023-09-13 12:19:28 +01:00
Matt Arsenault	231aa0f212	AMDGPU: Avoid creating vector extracts if we aren't going to do anything Try to avoid expensive checks failures from reporting no changes when some dead instructions were introduced.	2023-09-13 09:45:34 +03:00
Matt Arsenault	edecb60481	Reapply "AMDGPU: Drop and auto-upgrade llvm.amdgcn.ldexp to llvm.ldexp" This reverts commit d9333e360a7c52587ab6e4328e7493b357fb2cf3.	2023-09-13 08:38:48 +03:00
Pravin Jagtap	3755ea93b4	[AMDGPU] Fix scan of atomicFSub in AtomicOptimizer. (#66082 ) [D156301](https://reviews.llvm.org/D156301) introduced atomic optimizations for FAdd/FSub. For FSub, reduction/scan needs to be performed using add operation (`not sub`) and memory location will be updated by reduced value using atomic sub later by only one lane. --------- Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2023-09-13 09:57:10 +05:30
Jeffrey Byrnes	db47264ab3	Revert "[AMDGPU]: Allow combining into v_dot4" (#66158 ) This reverts commit 7fda1b74be4a173031192d8516869e87e6b7582d.	2023-09-12 16:57:17 -07:00
Kazu Hirata	a9c7ba964f	[AMDGPU] Fix a warning This patch fixes: llvm/lib/Target/AMDGPU/AMDGPU.h:297:18: error: private field 'TM' is not used [-Werror,-Wunused-private-field]	2023-09-12 14:02:07 -07:00
jwanggit86	b853988e0d	[AMDGPU] Port AMDGPURewriteUndefForPHI to new pass manager (#66008 ) This patch ports the AMDGPURewriteUndefForPHI pass to the new pass manager. With this, the pass is supported under both the legacy and the new pass managers. --------- Co-authored-by: Jun Wang <jun.wang7@amd.com>	2023-09-12 13:32:02 -07:00
Matt Arsenault	c48248d7f9	AMDGPU: Teach valueIsKnownNeverF32Denorm about frexp https://reviews.llvm.org/D158130	2023-09-12 23:23:10 +03:00
Matt Arsenault	72a7024add	AMDGPU: Correctly lower llvm.sqrt.f32 Make codegen emit correctly rounded sqrt by default. Emit the fast but only kind of fast expansion in AMDGPUCodeGenPrepare based on !fpmath, like the fdiv case. Hack around visitation ordering problems from AMDGPUCodeGenPrepare using forward iteration instead of a well behaved combiner. https://reviews.llvm.org/D158129	2023-09-12 23:22:54 +03:00
Kazu Hirata	0bb49afeaf	[AMDGPU] Fix an unused variable warning This patch fixes: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:2493:33: error: unused variable 'UserSGPRInfo' [-Werror,-Wunused-variable]	2023-09-12 12:06:36 -07:00
Austin Kerbow	343be5132e	[AMDGPU] Add utilities to track number of user SGPRs. NFC. Factor out and unify some common code that calculates and tracks the number of user SGRPs. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D159439	2023-09-12 08:52:30 -07:00
Saiyedul Islam	466a8149b3	Revert "[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410 )" (#66060 ) This reverts commit 0a8d17e79b02a92814a2a788d79df1f54d70ec3e.	2023-09-12 15:13:59 +05:30
Saiyedul Islam	0a8d17e79b	[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410 ) Also update LIT tests and docs. For more details, see https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata Reviewed By: arsenm, jhuber6 Github PR: #65410 Differential Revision: https://reviews.llvm.org/D129818	2023-09-12 13:53:31 +05:30
Jeremy Morse	e54277fa10	[NFC][RemoveDIs] Use iterators over inst-pointers when using IRBuilder This patch adds a two-argument SetInsertPoint method to IRBuilder that takes a block/iterator instead of an instruction, and updates many call sites to use it. The motivating reason for doing this is given here [0], we'd like to pass around more information about the position of debug-info in the iterator object. That necessitates passing iterators around most of the time. [0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939 Differential Revision: https://reviews.llvm.org/D152468	2023-09-11 20:01:19 +01:00
Stanislav Mekhanoshin	070c2570ad	[AMDGPU] Global ISel for packed fp32 instructions (#65803 )	2023-09-11 10:48:37 -07:00
Jeremy Morse	6942c64e81	[NFC][RemoveDIs] Prefer iterator-insertion over instructions Continuing the patch series to get rid of debug intrinsics [0], instruction insertion needs to be done with iterators rather than instruction pointers, so that we can communicate information in the iterator class. This patch adds an iterator-taking insertBefore method and converts various call sites to take iterators. These are all sites where such debug-info needs to be preserved so that a stage2 clang can be built identically; it's likely that many more will need to be changed in the future. At this stage, this is just changing the spelling of a few operations, which will eventually become signifiant once the debug-info bearing iterator is used. [0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939 Differential Revision: https://reviews.llvm.org/D152537	2023-09-11 11:48:45 +01:00
Piotr Sobczak	8fdf61a3bd	[AMDGPU][NFCI] Refactor BUFInstructions.td (#65746 ) Make the code more consistent: - More pattern classes follow the simpler interface with string instead of pseudos. - CmpSwap patterns are encapsulated in SIBufferAtomicCmpSwapPat. - Pseudo store patterns are separated out, similarly to the load counterparts. - MUBUF_Offset_Load_Pat is now GCNPat, as others.	2023-09-11 08:04:22 +02:00
Carl Ritson	1d8a94c4ff	[AMDGPU] SILowerControlFlow: fix preservation of LiveIntervals In emitElse live interval for SI_ELSE source must be recalculated as SI_ELSE is removed, and new user is placed at block start. In emitIfBreak live interval for new created AndReg must be computed. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D158141	2023-09-11 13:46:28 +09:00
Carl Ritson	46ee3b3914	[AMDGPU] SILowerI1Copies: clear kill flags on COPY (#65883 ) Clear kill flags on COPY source as it will be reused.	2023-09-11 12:30:08 +09:00
Matt Arsenault	17bd80601e	AMDGPU: Implement llvm.get.fpmode Currently s_getreg_b32 is missing the possible mode use. Really we need separate pseudos for mode-only accesses, but leave this as a pre-existing issue. https://reviews.llvm.org/D152710	2023-09-10 10:19:19 +03:00
jwanggit86	37b538819b	[AMDGPU] Incorrect error message regarding SCC modifier (#65660 ) For the AMD GFX90A GPU, the SCC instruction modifier is allowed for certain classes of instructions. However, the current assembler generates an error message, "scc is not supported on this GPU", regardless of the instruciton. This fix modifies the message as well as the logic for generating the message. Related tests are moved from gfx90a_err.s to gfx90a_asm_features.s. Co-authored-by: Jun Wang <jun.wang7@amd.com>	2023-09-08 13:07:06 -07:00
David Stuttard	8c03239934	[AMDGPU] New intrinsic void llvm.amdgcn.s.nop(i16) (#65757 ) This allows front ends to insert s_nops - this is most often when a delay less than s_sleep 1 is required.	2023-09-08 16:24:10 +01:00
Jay Foad	8669a9f93a	[AMDGPU] Cope with SelectionDAG::UpdateNodeOperands returning a different SDNode (#65765 ) SITargetLowering::adjustWritemask calls SelectionDAG::UpdateNodeOperands to update an EXTRACT_SUBREG node in-place to refer to a new IMAGE_LOAD instruction, before we delete the old IMAGE_LOAD instruction. But in UpdateNodeOperands can do CSE on the fly and return a different EXTRACT_SUBREG node, so the original EXTRACT_SUBREG node would still exist and would refer to the old deleted IMAGE_LOAD instruction. This caused errors like: t31: v3i32,ch = <<Deleted Node!>> # D:1 This target-independent node should have been selected! UNREACHABLE executed at lib/CodeGen/SelectionDAG/InstrEmitter.cpp:1209! Fix it by detecting the CSE case and replacing all uses of the original EXTRACT_SUBREG node with the CSE'd one. Recommit with a fix for a use-after-free bug in the first version of this patch (#65340) which was caught by asan.	2023-09-08 16:16:02 +01:00
Jay Foad	dd5af895bb	[AMDGPU] Mark S_NOP as having side effects (#65745 ) This prevents S_NOP from being rescheduled past other (side-effecting) instructions, which is useful because it is generally used to introduce a short delay or to avoid hazards. Currently this only affects MIR tests because the compiler itself only inserts nops in PostRAHazardRecognizer which runs after all scheduling.	2023-09-08 14:05:56 +01:00
Nicolai Hähnle	2eb767c9e1	AMDGPU: Scratch instructions are trivially disjoint from SMEM and buffer instructions (#65287 ) Scratch instructions are always in addrspace(5), which can only alias with flat (and itself). SMEM and buffer instructions can never reference those address spaces, so they are trivially disjoint.	2023-09-08 07:43:36 +02:00
Jeffrey Byrnes	5044531afd	[AMDGPU] Teach CalculateByteProvider about AMDGPUISD::PERM (#65547 ) As a standalone patch, it has limited effect. However, it is necessary as it supports upcoming commits.	2023-09-07 15:13:42 -07:00
Jeffrey Byrnes	7fda1b74be	[AMDGPU]: Allow combining into v_dot4 Differential Revision: https://reviews.llvm.org/D155995 Change-Id: I794f540217f0f84141338757b41b1be0493c7207	2023-09-07 12:58:48 -07:00
Matt Arsenault	4fcc21bbce	AMDGPU: Remove unused node definition	2023-09-07 20:30:14 +03:00
Pierre van Houtryve	30955c9d22	[AMDGPU] Fix V_MOV_B32_indirect inst size (#65584 ) This inst lowers to a normal v_mov_b32 so it's not zero-sized, but has a size of 4. Solves SWDEV-416337	2023-09-07 13:12:58 +02:00
Stanislav Mekhanoshin	0dd4d3b5cc	[AMDGPU] Remove predicate on real packed fp32 instructions (#65589 ) It is copied from the pseudo anyway.	2023-09-07 03:17:25 -07:00
Florian Mayer	42a1d16179	Revert "[AMDGPU] Cope with SelectionDAG::UpdateNodeOperands returning a different SDNode (#65340 )" This reverts commit 11171d81aeafb0c2818f288900423e366a2787fc. Broke ASAN bot.	2023-09-06 13:16:55 -07:00
Jay Foad	11171d81ae	[AMDGPU] Cope with SelectionDAG::UpdateNodeOperands returning a different SDNode (#65340 ) SITargetLowering::adjustWritemask calls SelectionDAG::UpdateNodeOperands to update an EXTRACT_SUBREG node in-place to refer to a new IMAGE_LOAD instruction, before we delete the old IMAGE_LOAD instruction. But in UpdateNodeOperands can do CSE on the fly and return a different EXTRACT_SUBREG node, so the original EXTRACT_SUBREG node would still exist and would refer to the old deleted IMAGE_LOAD instruction. This caused errors like: t31: v3i32,ch = <<Deleted Node!>> # D:1 This target-independent node should have been selected! UNREACHABLE executed at lib/CodeGen/SelectionDAG/InstrEmitter.cpp:1209! Fix it by detecting the CSE case and replacing all uses of the original EXTRACT_SUBREG node with the CSE'd one.	2023-09-06 12:51:44 +01:00
Sergei Barannikov	a479be0f39	[MC] Change tryParseRegister to return ParseStatus (NFC) This finishes the work of replacing OperandMatchResultTy with ParseStatus, started in D154101. As a drive-by change, rename some RegNo variables to just Reg (a leftover from the days when RegNo had 'unsigned' type).	2023-09-06 10:28:12 +03:00
Pravin Jagtap	b230472f22	[AMDGPU] Extend v2i16 & v2f16 support for llvm.amdgcn.update.dpp intr (#65318 ) Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2023-09-06 10:20:34 +05:30
pvanhout	4e513f69a1	[GlobalISel] Cleanup Combine.td Now that the old backend is gone, clean-up a few things that no longer make sense and tidy up the file a bit. Depends on D158710 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D158714	2023-09-05 08:19:06 +02:00
pvanhout	aaf6755631	[GlobalISel] Refactor Combiner API Remove CodeGen leftovers from the old combiner backend and adapt the API to fit the new backend better. It's now quite a bit closer to how InstructionSelector works. - `CombinerInfo` is now a simple "options" struct. - `Combiner` is now the base class of all TableGen'd combiner implementation. - Many fields have been moved from derived classes into that class. - It has been refactored to create & own the Observer and Builder. - `tryCombineAll` TableGen'd method can now be renamed, which allows targets to implement the actual `tryCombineAll` call manually and do whatever they want to do before/after it. Note: `CombinerHelper` needs to be mutable because none of its methods are const. This can be revisited later. Depends on D158710 Reviewed By: aemerson, dsanders Differential Revision: https://reviews.llvm.org/D158713	2023-09-05 08:19:05 +02:00
TY-AMD	b1b6c06567	[AMDGPU] Erase ShaderFunctions in AMDGPUPALMetadata::reset() (#65247 )	2023-09-04 08:03:01 -04:00
Matt Arsenault	77c67436d9	LLT: Add some stub constructors for FP types This is to start documenting uses to ease a future migration to supporting different types with the same size. https://reviews.llvm.org/D150605	2023-09-03 08:33:19 -04:00
Matt Arsenault	f7dcabe502	AMDGPU: Pass in TargetMachine to AMDGPULowerModuleLDSPass https://reviews.llvm.org/D157660	2023-09-02 12:02:36 -04:00
Matt Arsenault	1f52060000	AMDGPU: Use poison instead of undef in module lds pass	2023-09-02 11:33:26 -04:00
Matt Arsenault	ee795fd1cf	AMDGPU: Handle rounding intrinsic exponents in isKnownIntegral https://reviews.llvm.org/D158999	2023-09-01 08:22:16 -04:00
Matt Arsenault	def228553c	AMDGPU: Use pown instead of pow if known integral https://reviews.llvm.org/D158998	2023-09-01 08:22:16 -04:00
Matt Arsenault	deefda7074	AMDGPU: Use exp2 and log2 intrinsics directly for f16/f32 These codegen correctly but f64 doesn't. This prevents losing fast math flags on the way to the underlying intrinsic. https://reviews.llvm.org/D158997	2023-09-01 08:22:16 -04:00
Matt Arsenault	dac8f974b5	AMDGPU: Handle sitofp and uitofp exponents in fast pow expansion https://reviews.llvm.org/D158996	2023-09-01 08:22:16 -04:00

1 2 3 4 5 ...

8304 Commits