llvm-project

Author	SHA1	Message	Date
Guozhi Wei	cbdccb30c2	[RA] Split a virtual register in cold blocks if it is not assigned preferred physical register If a virtual register is not assigned preferred physical register, it means some COPY instructions will be changed to real register move instructions. In this case we can try to split the virtual register in colder blocks, if success, the original COPY instructions can be deleted, and the new COPY instructions in colder blocks will be generated as register move instructions. It results in fewer dynamic register move instructions executed. The new test case split-reg-with-hint.ll gives an example, the hot path contains 24 instructions without this patch, now it is only 4 instructions with this patch. Differential Revision: https://reviews.llvm.org/D156491	2023-09-15 19:52:50 +00:00
Benjamin Kramer	3454cf67bd	Revert "[MachineLICM] Handle Subloops" This reverts commit 5ec9699c4d1f165364586d825baef434e2c110b4. It accesses MI after it has been hoisted.	2023-09-15 13:20:31 +02:00
Jay Foad	ceb68eea8c	[AMDGPU] Remove repeated -mtriple options from RUN lines (#66486 )	2023-09-15 11:29:24 +01:00
Pierre van Houtryve	e9e3868707	[AMDGPU] Correctly restore FP mode in FDIV32 lowering (#66346 ) Addresses the FIXME for both DAGISel and GISel.	2023-09-15 08:11:01 +02:00
Jingu Kang	5ec9699c4d	[MachineLICM] Handle Subloops Following discussion on https://reviews.llvm.org/D154205, make MachineLICM pass handle subloops with only visiting outermost loop's blocks once. Differential Revision: https://reviews.llvm.org/D154205	2023-09-14 18:07:31 +01:00
Pierre van Houtryve	3d0353793b	[AMDGPU] Fix `HasFP32Denormals` check in FDIV32 lowering (#66212 ) Fixes SWDEV-403219	2023-09-14 08:47:10 +02:00
Jeffrey Byrnes	372115fadd	[AMDGPU] Precommit test for i8 vector CopyToReg handling patch Adds test to show impact on cross block CopyToReg & CopyFromReg handling for n x i8, and shows NFC on CC Differential Revision: https://reviews.llvm.org/D159303 Change-Id: Ib6d9802dbebe8e3245e4ccfd4a6f23357de8c480	2023-09-13 11:27:15 -07:00
Simon Pilgrim	e6b85c3027	[DAG] FoldSetCC - add missing icmp(X,undef) -> isTrueWhenEqual case (REAPPLIED) Followup to D59363 which failed to handle the icmp(X,undef) -> isTrueWhenEqual case - similar to llvm::ConstantFoldCompareInstruction As discussed on the review, this is affecting some previously reduced test cases, but will also prevent reductions from relying on this inconsistent behaviour in the future. Reapplied after reversion at e1e3c75c7dad72 with a tweak to the pseudo-probe-peep.ll test Differential Revision: https://reviews.llvm.org/D158068	2023-09-13 12:33:39 +01:00
Simon Pilgrim	e1e3c75c7d	Revert rG6c56cf71ee82ec3a28e0dfc2b751bd10c16929da "[DAG] FoldSetCC - add missing icmp(X,undef) -> isTrueWhenEqual case" Need to address a missed test change	2023-09-13 11:27:47 +01:00
Simon Pilgrim	6c56cf71ee	[DAG] FoldSetCC - add missing icmp(X,undef) -> isTrueWhenEqual case Followup to D59363 which failed to handle the icmp(X,undef) -> isTrueWhenEqual case - similar to llvm::ConstantFoldCompareInstruction As discussed on the review, this is affecting some previously reduced test cases, but will also prevent reductions from relying on this inconsistent behaviour in the future. Differential Revision: https://reviews.llvm.org/D158068	2023-09-13 11:01:58 +01:00
Matt Arsenault	231aa0f212	AMDGPU: Avoid creating vector extracts if we aren't going to do anything Try to avoid expensive checks failures from reporting no changes when some dead instructions were introduced.	2023-09-13 09:45:34 +03:00
Matt Arsenault	edecb60481	Reapply "AMDGPU: Drop and auto-upgrade llvm.amdgcn.ldexp to llvm.ldexp" This reverts commit d9333e360a7c52587ab6e4328e7493b357fb2cf3.	2023-09-13 08:38:48 +03:00
Pravin Jagtap	3755ea93b4	[AMDGPU] Fix scan of atomicFSub in AtomicOptimizer. (#66082 ) [D156301](https://reviews.llvm.org/D156301) introduced atomic optimizations for FAdd/FSub. For FSub, reduction/scan needs to be performed using add operation (`not sub`) and memory location will be updated by reduced value using atomic sub later by only one lane. --------- Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2023-09-13 09:57:10 +05:30
Jeffrey Byrnes	db47264ab3	Revert "[AMDGPU]: Allow combining into v_dot4" (#66158 ) This reverts commit 7fda1b74be4a173031192d8516869e87e6b7582d.	2023-09-12 16:57:17 -07:00
jwanggit86	b853988e0d	[AMDGPU] Port AMDGPURewriteUndefForPHI to new pass manager (#66008 ) This patch ports the AMDGPURewriteUndefForPHI pass to the new pass manager. With this, the pass is supported under both the legacy and the new pass managers. --------- Co-authored-by: Jun Wang <jun.wang7@amd.com>	2023-09-12 13:32:02 -07:00
Matt Arsenault	c48248d7f9	AMDGPU: Teach valueIsKnownNeverF32Denorm about frexp https://reviews.llvm.org/D158130	2023-09-12 23:23:10 +03:00
Matt Arsenault	72a7024add	AMDGPU: Correctly lower llvm.sqrt.f32 Make codegen emit correctly rounded sqrt by default. Emit the fast but only kind of fast expansion in AMDGPUCodeGenPrepare based on !fpmath, like the fdiv case. Hack around visitation ordering problems from AMDGPUCodeGenPrepare using forward iteration instead of a well behaved combiner. https://reviews.llvm.org/D158129	2023-09-12 23:22:54 +03:00
Jay Foad	928c9d6851	[AMDGPU] Fix some MIR tests (#66090 ) Fix some problems in hand written MIR tests that only showed up when I tried to run LiveIntervals on them, after which they failed machine verification with "Use not jointly dominated by defs" errors.	2023-09-12 16:32:41 +01:00
Saiyedul Islam	466a8149b3	Revert "[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410 )" (#66060 ) This reverts commit 0a8d17e79b02a92814a2a788d79df1f54d70ec3e.	2023-09-12 15:13:59 +05:30
Ivan Kosarev	eaf737a4e0	[AMDGPU] Remove the GFX11 runs in CodeGen/AMDGPU/fma.f16.ll. It still fails with expensive checks enabled. This partially reverts: a1e38e0b8e3e [AMDGPU][GFX11] Add more test coverage for FMA instructions.	2023-09-12 10:30:52 +01:00
Ivan Kosarev	a1e38e0b8e	[AMDGPU][GFX11] Add more test coverage for FMA instructions. (#65935 ) This is another attempt to update the tests to run for GFX11. Previously done in <https://reviews.llvm.org/D153269>, and then reverted in <https://reviews.llvm.org/rG2d3e6c440244ad94777aa13566b0376eb3c088f1> due to a failure on a buildbot with expensive checks enabled. Commit 4b1702e87a2687569b197aea4721353f8b788182 fixed the problem.	2023-09-12 09:40:10 +01:00
Saiyedul Islam	0a8d17e79b	[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410 ) Also update LIT tests and docs. For more details, see https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata Reviewed By: arsenm, jhuber6 Github PR: #65410 Differential Revision: https://reviews.llvm.org/D129818	2023-09-12 13:53:31 +05:30
pvanhout	2126a18d86	[AMDGPU] Regen combine-fma-add-mul-pre-legalize.mir	2023-09-12 08:50:12 +02:00
Fangrui Song	806761a762	[test] Change llc -march= to -mtriple= The issue is uncovered by #47698: for IR files without a target triple, -mtriple= specifies the full target triple while -march= merely sets the architecture part of the default target triple, leaving a target triple which may not make sense, e.g. riscv64-apple-darwin. Therefore, -march= is error-prone and not recommended for tests without a target triple. The issue has been benign as we recognize $unknown-apple-darwin as ELF instead of rejecting it outrightly.	2023-09-11 14:42:37 -07:00
Vitaly Buka	f106b3f135	Revert "[PHIElimination] Handle subranges in LiveInterval updates" Leaks memory. This reverts commit 3bff611068ae70e3273a46bbc72bc66b66f98c1c.	2023-09-11 11:09:26 -07:00
Jeremy Morse	1ce1732f82	[DebugInfo] Use getStableDebugLoc to pick IRBuilder DebugLocs When IRBuilder is given an insertion position and there is debug-info, it sets the DebugLoc of newly inserted instructions to the DebugLoc of the insertion position. Unfortunately, that means if you insert in front of a debug intrinsics, your "real" instructions get potentially-misleading source locations from the debug intrinsics. Worse, if you compile -gmlt to get source locations but no variable locations, you'll get different source locations to a normal -g build, which is silly. Rectify this with the getStableDebugLoc method, which skips over any debug intrinsics to find the next "real" instruction. This is the source location that you would get if you compile with -gmlt, and it remains stable in the presence of debug intrinsics. The changed tests show a few locations where this has been happening, for example selecting line-zero locations for instrumentation on a perfectly valid call site. Differential Revision: https://reviews.llvm.org/D159485	2023-09-11 19:00:44 +01:00
Stanislav Mekhanoshin	070c2570ad	[AMDGPU] Global ISel for packed fp32 instructions (#65803 )	2023-09-11 10:48:37 -07:00
Stanislav Mekhanoshin	093aa37744	[AMDGPU] Autogenerate min.ll/max.ll tests. NFC. (#65786 )	2023-09-11 10:29:53 -07:00
Carl Ritson	3bff611068	[PHIElimination] Handle subranges in LiveInterval updates Add handling for subrange updates in LiveInterval preservation. This requires extending MachineBasicBlock::SplitCriticalEdge to also update subrange intervals. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D158144	2023-09-11 17:15:09 +09:00
Carl Ritson	1d8a94c4ff	[AMDGPU] SILowerControlFlow: fix preservation of LiveIntervals In emitElse live interval for SI_ELSE source must be recalculated as SI_ELSE is removed, and new user is placed at block start. In emitIfBreak live interval for new created AndReg must be computed. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D158141	2023-09-11 13:46:28 +09:00
Carl Ritson	46ee3b3914	[AMDGPU] SILowerI1Copies: clear kill flags on COPY (#65883 ) Clear kill flags on COPY source as it will be reused.	2023-09-11 12:30:08 +09:00
Matt Arsenault	17bd80601e	AMDGPU: Implement llvm.get.fpmode Currently s_getreg_b32 is missing the possible mode use. Really we need separate pseudos for mode-only accesses, but leave this as a pre-existing issue. https://reviews.llvm.org/D152710	2023-09-10 10:19:19 +03:00
David Stuttard	8c03239934	[AMDGPU] New intrinsic void llvm.amdgcn.s.nop(i16) (#65757 ) This allows front ends to insert s_nops - this is most often when a delay less than s_sleep 1 is required.	2023-09-08 16:24:10 +01:00
Jay Foad	8669a9f93a	[AMDGPU] Cope with SelectionDAG::UpdateNodeOperands returning a different SDNode (#65765 ) SITargetLowering::adjustWritemask calls SelectionDAG::UpdateNodeOperands to update an EXTRACT_SUBREG node in-place to refer to a new IMAGE_LOAD instruction, before we delete the old IMAGE_LOAD instruction. But in UpdateNodeOperands can do CSE on the fly and return a different EXTRACT_SUBREG node, so the original EXTRACT_SUBREG node would still exist and would refer to the old deleted IMAGE_LOAD instruction. This caused errors like: t31: v3i32,ch = <<Deleted Node!>> # D:1 This target-independent node should have been selected! UNREACHABLE executed at lib/CodeGen/SelectionDAG/InstrEmitter.cpp:1209! Fix it by detecting the CSE case and replacing all uses of the original EXTRACT_SUBREG node with the CSE'd one. Recommit with a fix for a use-after-free bug in the first version of this patch (#65340) which was caught by asan.	2023-09-08 16:16:02 +01:00
Jay Foad	dd5af895bb	[AMDGPU] Mark S_NOP as having side effects (#65745 ) This prevents S_NOP from being rescheduled past other (side-effecting) instructions, which is useful because it is generally used to introduce a short delay or to avoid hazards. Currently this only affects MIR tests because the compiler itself only inserts nops in PostRAHazardRecognizer which runs after all scheduling.	2023-09-08 14:05:56 +01:00
Nicolai Hähnle	2eb767c9e1	AMDGPU: Scratch instructions are trivially disjoint from SMEM and buffer instructions (#65287 ) Scratch instructions are always in addrspace(5), which can only alias with flat (and itself). SMEM and buffer instructions can never reference those address spaces, so they are trivially disjoint.	2023-09-08 07:43:36 +02:00
Jeffrey Byrnes	5044531afd	[AMDGPU] Teach CalculateByteProvider about AMDGPUISD::PERM (#65547 ) As a standalone patch, it has limited effect. However, it is necessary as it supports upcoming commits.	2023-09-07 15:13:42 -07:00
Jeffrey Byrnes	7fda1b74be	[AMDGPU]: Allow combining into v_dot4 Differential Revision: https://reviews.llvm.org/D155995 Change-Id: I794f540217f0f84141338757b41b1be0493c7207	2023-09-07 12:58:48 -07:00
Amara Emerson	1cc9f626cb	[GlobalISel] Add constant-folding of FP binops to combiner. (#65230 )	2023-09-07 19:33:35 +03:00
pvanhout	69036eb735	[AMDGPU] Fix code-size-estimate.mir test Expensive-checks was failing on it.	2023-09-07 14:04:12 +02:00
Pierre van Houtryve	30955c9d22	[AMDGPU] Fix V_MOV_B32_indirect inst size (#65584 ) This inst lowers to a normal v_mov_b32 so it's not zero-sized, but has a size of 4. Solves SWDEV-416337	2023-09-07 13:12:58 +02:00
Tuan Chuong Goh	b7a305deca	[AArch64][GlobalISel] Optimise Combine Funnel Shift Combine any funnel shift with a shift amount of 0 to a copy. Modulo is applied to shift amount if it is larger than the instruction's bitwidth. Differential Revision: https://reviews.llvm.org/D157591	2023-09-07 11:58:12 +01:00
Florian Mayer	42a1d16179	Revert "[AMDGPU] Cope with SelectionDAG::UpdateNodeOperands returning a different SDNode (#65340 )" This reverts commit 11171d81aeafb0c2818f288900423e366a2787fc. Broke ASAN bot.	2023-09-06 13:16:55 -07:00
Jay Foad	11171d81ae	[AMDGPU] Cope with SelectionDAG::UpdateNodeOperands returning a different SDNode (#65340 ) SITargetLowering::adjustWritemask calls SelectionDAG::UpdateNodeOperands to update an EXTRACT_SUBREG node in-place to refer to a new IMAGE_LOAD instruction, before we delete the old IMAGE_LOAD instruction. But in UpdateNodeOperands can do CSE on the fly and return a different EXTRACT_SUBREG node, so the original EXTRACT_SUBREG node would still exist and would refer to the old deleted IMAGE_LOAD instruction. This caused errors like: t31: v3i32,ch = <<Deleted Node!>> # D:1 This target-independent node should have been selected! UNREACHABLE executed at lib/CodeGen/SelectionDAG/InstrEmitter.cpp:1209! Fix it by detecting the CSE case and replacing all uses of the original EXTRACT_SUBREG node with the CSE'd one.	2023-09-06 12:51:44 +01:00
Pravin Jagtap	b230472f22	[AMDGPU] Extend v2i16 & v2f16 support for llvm.amdgcn.update.dpp intr (#65318 ) Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2023-09-06 10:20:34 +05:30
Amara Emerson	6c31f20fee	[GlobalISel] Fold fmul x, 1.0 -> x (#65379 )	2023-09-06 03:14:16 +08:00
Amara Emerson	08e04209d8	[GlobalISel] Commute G_FMUL and G_FADD constant LHS to RHS. (#65298 )	2023-09-05 23:48:34 +08:00
Nicolai Hähnle	62790a8d4a	AMDGPU: Fix test from previous commit	2023-09-05 00:31:49 +02:00
Nicolai Hähnle	f5fb6ad2e5	AMDGPU: Precommit a test file Demonstrates bad scheduling for private load/store vs. buffer intrinsics.	2023-09-05 00:17:46 +02:00
Jay Foad	71ca53b6cf	[GlobalISel] Lower G_SHUFFLE_VECTOR with scalar result (#65275 )	2023-09-04 13:32:43 -04:00

1 2 3 4 5 ...

6777 Commits