6777 Commits

Author SHA1 Message Date
Guozhi Wei
cbdccb30c2 [RA] Split a virtual register in cold blocks if it is not assigned preferred physical register
If a virtual register is not assigned preferred physical register, it means some
COPY instructions will be changed to real register move instructions. In this
case we can try to split the virtual register in colder blocks, if success, the
original COPY instructions can be deleted, and the new COPY instructions in
colder blocks will be generated as register move instructions. It results in
fewer dynamic register move instructions executed.

The new test case split-reg-with-hint.ll gives an example, the hot path contains
24 instructions without this patch, now it is only 4 instructions with this
patch.

Differential Revision: https://reviews.llvm.org/D156491
2023-09-15 19:52:50 +00:00
Benjamin Kramer
3454cf67bd Revert "[MachineLICM] Handle Subloops"
This reverts commit 5ec9699c4d1f165364586d825baef434e2c110b4. It
accesses MI after it has been hoisted.
2023-09-15 13:20:31 +02:00
Jay Foad
ceb68eea8c
[AMDGPU] Remove repeated -mtriple options from RUN lines (#66486) 2023-09-15 11:29:24 +01:00
Pierre van Houtryve
e9e3868707
[AMDGPU] Correctly restore FP mode in FDIV32 lowering (#66346)
Addresses the FIXME for both DAGISel and GISel.
2023-09-15 08:11:01 +02:00
Jingu Kang
5ec9699c4d [MachineLICM] Handle Subloops
Following discussion on https://reviews.llvm.org/D154205, make MachineLICM pass
handle subloops with only visiting outermost loop's blocks once.

Differential Revision: https://reviews.llvm.org/D154205
2023-09-14 18:07:31 +01:00
Pierre van Houtryve
3d0353793b
[AMDGPU] Fix HasFP32Denormals check in FDIV32 lowering (#66212)
Fixes SWDEV-403219
2023-09-14 08:47:10 +02:00
Jeffrey Byrnes
372115fadd [AMDGPU] Precommit test for i8 vector CopyToReg handling patch
Adds test to show impact on cross block CopyToReg & CopyFromReg handling for n x i8, and shows NFC on CC

Differential Revision: https://reviews.llvm.org/D159303

Change-Id: Ib6d9802dbebe8e3245e4ccfd4a6f23357de8c480
2023-09-13 11:27:15 -07:00
Simon Pilgrim
e6b85c3027 [DAG] FoldSetCC - add missing icmp(X,undef) -> isTrueWhenEqual case (REAPPLIED)
Followup to D59363 which failed to handle the icmp(X,undef) -> isTrueWhenEqual case - similar to llvm::ConstantFoldCompareInstruction

As discussed on the review, this is affecting some previously reduced test cases, but will also prevent reductions from relying on this inconsistent behaviour in the future.

Reapplied after reversion at e1e3c75c7dad72 with a tweak to the pseudo-probe-peep.ll test

Differential Revision: https://reviews.llvm.org/D158068
2023-09-13 12:33:39 +01:00
Simon Pilgrim
e1e3c75c7d Revert rG6c56cf71ee82ec3a28e0dfc2b751bd10c16929da "[DAG] FoldSetCC - add missing icmp(X,undef) -> isTrueWhenEqual case"
Need to address a missed test change
2023-09-13 11:27:47 +01:00
Simon Pilgrim
6c56cf71ee [DAG] FoldSetCC - add missing icmp(X,undef) -> isTrueWhenEqual case
Followup to D59363 which failed to handle the icmp(X,undef) -> isTrueWhenEqual case - similar to llvm::ConstantFoldCompareInstruction

As discussed on the review, this is affecting some previously reduced test cases, but will also prevent reductions from relying on this inconsistent behaviour in the future.

Differential Revision: https://reviews.llvm.org/D158068
2023-09-13 11:01:58 +01:00
Matt Arsenault
231aa0f212 AMDGPU: Avoid creating vector extracts if we aren't going to do anything
Try to avoid expensive checks failures from reporting no changes
when some dead instructions were introduced.
2023-09-13 09:45:34 +03:00
Matt Arsenault
edecb60481 Reapply "AMDGPU: Drop and auto-upgrade llvm.amdgcn.ldexp to llvm.ldexp"
This reverts commit d9333e360a7c52587ab6e4328e7493b357fb2cf3.
2023-09-13 08:38:48 +03:00
Pravin Jagtap
3755ea93b4
[AMDGPU] Fix scan of atomicFSub in AtomicOptimizer. (#66082)
[D156301](https://reviews.llvm.org/D156301) introduced atomic
optimizations for FAdd/FSub. For FSub, reduction/scan needs to be
performed using add operation (`not sub`) and memory location will be
updated by reduced value using atomic sub later by only one lane.

---------

Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2023-09-13 09:57:10 +05:30
Jeffrey Byrnes
db47264ab3
Revert "[AMDGPU]: Allow combining into v_dot4" (#66158)
This reverts commit 7fda1b74be4a173031192d8516869e87e6b7582d.
2023-09-12 16:57:17 -07:00
jwanggit86
b853988e0d
[AMDGPU] Port AMDGPURewriteUndefForPHI to new pass manager (#66008)
This patch ports the AMDGPURewriteUndefForPHI pass to the new pass
manager. With this, the pass is supported under both the legacy and the
new pass managers.

---------

Co-authored-by: Jun Wang <jun.wang7@amd.com>
2023-09-12 13:32:02 -07:00
Matt Arsenault
c48248d7f9 AMDGPU: Teach valueIsKnownNeverF32Denorm about frexp
https://reviews.llvm.org/D158130
2023-09-12 23:23:10 +03:00
Matt Arsenault
72a7024add AMDGPU: Correctly lower llvm.sqrt.f32
Make codegen emit correctly rounded sqrt by default.

Emit the fast but only kind of fast expansion in AMDGPUCodeGenPrepare
based on !fpmath, like the fdiv case. Hack around visitation ordering
problems from AMDGPUCodeGenPrepare using forward iteration instead of
a well behaved combiner.

https://reviews.llvm.org/D158129
2023-09-12 23:22:54 +03:00
Jay Foad
928c9d6851
[AMDGPU] Fix some MIR tests (#66090)
Fix some problems in hand written MIR tests that only showed up when I
tried to run LiveIntervals on them, after which they failed machine
verification with "Use not jointly dominated by defs" errors.
2023-09-12 16:32:41 +01:00
Saiyedul Islam
466a8149b3
Revert "[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410)" (#66060)
This reverts commit 0a8d17e79b02a92814a2a788d79df1f54d70ec3e.
2023-09-12 15:13:59 +05:30
Ivan Kosarev
eaf737a4e0 [AMDGPU] Remove the GFX11 runs in CodeGen/AMDGPU/fma.f16.ll.
It still fails with expensive checks enabled.

This partially reverts:
a1e38e0b8e3e [AMDGPU][GFX11] Add more test coverage for FMA instructions.
2023-09-12 10:30:52 +01:00
Ivan Kosarev
a1e38e0b8e
[AMDGPU][GFX11] Add more test coverage for FMA instructions. (#65935)
This is another attempt to update the tests to run for GFX11. Previously
done in <https://reviews.llvm.org/D153269>, and then reverted in
<https://reviews.llvm.org/rG2d3e6c440244ad94777aa13566b0376eb3c088f1>
due to a failure on a buildbot with expensive checks enabled. Commit
4b1702e87a2687569b197aea4721353f8b788182 fixed the problem.
2023-09-12 09:40:10 +01:00
Saiyedul Islam
0a8d17e79b
[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410)
Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata

Reviewed By: arsenm, jhuber6

Github PR: #65410

Differential Revision: https://reviews.llvm.org/D129818
2023-09-12 13:53:31 +05:30
pvanhout
2126a18d86 [AMDGPU] Regen combine-fma-add-mul-pre-legalize.mir 2023-09-12 08:50:12 +02:00
Fangrui Song
806761a762 [test] Change llc -march= to -mtriple=
The issue is uncovered by #47698: for IR files without a target triple,
-mtriple= specifies the full target triple while -march= merely sets the
architecture part of the default target triple, leaving a target triple which
may not make sense, e.g. riscv64-apple-darwin.

Therefore, -march= is error-prone and not recommended for tests without a target
triple. The issue has been benign as we recognize $unknown-apple-darwin as ELF instead
of rejecting it outrightly.
2023-09-11 14:42:37 -07:00
Vitaly Buka
f106b3f135 Revert "[PHIElimination] Handle subranges in LiveInterval updates"
Leaks memory.

This reverts commit 3bff611068ae70e3273a46bbc72bc66b66f98c1c.
2023-09-11 11:09:26 -07:00
Jeremy Morse
1ce1732f82 [DebugInfo] Use getStableDebugLoc to pick IRBuilder DebugLocs
When IRBuilder is given an insertion position and there is debug-info, it
sets the DebugLoc of newly inserted instructions to the DebugLoc of the
insertion position. Unfortunately, that means if you insert in front of a
debug intrinsics, your "real" instructions get potentially-misleading
source locations from the debug intrinsics. Worse, if you compile -gmlt to
get source locations but no variable locations, you'll get different source
locations to a normal -g build, which is silly.

Rectify this with the getStableDebugLoc method, which skips over any debug
intrinsics to find the next "real" instruction. This is the source location
that you would get if you compile with -gmlt, and it remains stable in the
presence of debug intrinsics. The changed tests show a few locations where
this has been happening, for example selecting line-zero locations for
instrumentation on a perfectly valid call site.

Differential Revision: https://reviews.llvm.org/D159485
2023-09-11 19:00:44 +01:00
Stanislav Mekhanoshin
070c2570ad
[AMDGPU] Global ISel for packed fp32 instructions (#65803) 2023-09-11 10:48:37 -07:00
Stanislav Mekhanoshin
093aa37744
[AMDGPU] Autogenerate min.ll/max.ll tests. NFC. (#65786) 2023-09-11 10:29:53 -07:00
Carl Ritson
3bff611068 [PHIElimination] Handle subranges in LiveInterval updates
Add handling for subrange updates in LiveInterval preservation.
This requires extending MachineBasicBlock::SplitCriticalEdge
to also update subrange intervals.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D158144
2023-09-11 17:15:09 +09:00
Carl Ritson
1d8a94c4ff [AMDGPU] SILowerControlFlow: fix preservation of LiveIntervals
In emitElse live interval for SI_ELSE source must be recalculated
as SI_ELSE is removed, and new user is placed at block start.
In emitIfBreak live interval for new created AndReg must be
computed.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D158141
2023-09-11 13:46:28 +09:00
Carl Ritson
46ee3b3914
[AMDGPU] SILowerI1Copies: clear kill flags on COPY (#65883)
Clear kill flags on COPY source as it will be reused.
2023-09-11 12:30:08 +09:00
Matt Arsenault
17bd80601e AMDGPU: Implement llvm.get.fpmode
Currently s_getreg_b32 is missing the possible mode use. Really we
need separate pseudos for mode-only accesses, but leave this as a
pre-existing issue.

https://reviews.llvm.org/D152710
2023-09-10 10:19:19 +03:00
David Stuttard
8c03239934
[AMDGPU] New intrinsic void llvm.amdgcn.s.nop(i16) (#65757)
This allows front ends to insert s_nops - this is most often when a
delay less
than s_sleep 1 is required.
2023-09-08 16:24:10 +01:00
Jay Foad
8669a9f93a
[AMDGPU] Cope with SelectionDAG::UpdateNodeOperands returning a different SDNode (#65765)
SITargetLowering::adjustWritemask calls SelectionDAG::UpdateNodeOperands
to update an EXTRACT_SUBREG node in-place to refer to a new IMAGE_LOAD
instruction, before we delete the old IMAGE_LOAD instruction. But in
UpdateNodeOperands can do CSE on the fly and return a different
EXTRACT_SUBREG node, so the original EXTRACT_SUBREG node would still
exist and would refer to the old deleted IMAGE_LOAD instruction. This
caused errors like:

t31: v3i32,ch = <<Deleted Node!>> # D:1
This target-independent node should have been selected!
UNREACHABLE executed at lib/CodeGen/SelectionDAG/InstrEmitter.cpp:1209!

Fix it by detecting the CSE case and replacing all uses of the original
EXTRACT_SUBREG node with the CSE'd one.

Recommit with a fix for a use-after-free bug in the first version of
this patch (#65340) which was caught by asan.
2023-09-08 16:16:02 +01:00
Jay Foad
dd5af895bb
[AMDGPU] Mark S_NOP as having side effects (#65745)
This prevents S_NOP from being rescheduled past other (side-effecting)
instructions, which is useful because it is generally used to introduce
a short delay or to avoid hazards. Currently this only affects MIR tests
because the compiler itself only inserts nops in PostRAHazardRecognizer
which runs after all scheduling.
2023-09-08 14:05:56 +01:00
Nicolai Hähnle
2eb767c9e1
AMDGPU: Scratch instructions are trivially disjoint from SMEM and buffer instructions (#65287)
Scratch instructions are always in addrspace(5), which can only alias
with flat (and itself). SMEM and buffer instructions can never reference
those address spaces, so they are trivially disjoint.
2023-09-08 07:43:36 +02:00
Jeffrey Byrnes
5044531afd
[AMDGPU] Teach CalculateByteProvider about AMDGPUISD::PERM (#65547)
As a standalone patch, it has limited effect. However, it is necessary
as it supports upcoming commits.
2023-09-07 15:13:42 -07:00
Jeffrey Byrnes
7fda1b74be [AMDGPU]: Allow combining into v_dot4
Differential Revision: https://reviews.llvm.org/D155995

Change-Id: I794f540217f0f84141338757b41b1be0493c7207
2023-09-07 12:58:48 -07:00
Amara Emerson
1cc9f626cb
[GlobalISel] Add constant-folding of FP binops to combiner. (#65230) 2023-09-07 19:33:35 +03:00
pvanhout
69036eb735 [AMDGPU] Fix code-size-estimate.mir test
Expensive-checks was failing on it.
2023-09-07 14:04:12 +02:00
Pierre van Houtryve
30955c9d22
[AMDGPU] Fix V_MOV_B32_indirect inst size (#65584)
This inst lowers to a normal v_mov_b32 so it's not zero-sized, but has a
size of 4.

Solves SWDEV-416337
2023-09-07 13:12:58 +02:00
Tuan Chuong Goh
b7a305deca [AArch64][GlobalISel] Optimise Combine Funnel Shift
Combine any funnel shift with a shift amount of 0 to a copy.
Modulo is applied to shift amount if it is larger than the
instruction's bitwidth.

Differential Revision: https://reviews.llvm.org/D157591
2023-09-07 11:58:12 +01:00
Florian Mayer
42a1d16179 Revert "[AMDGPU] Cope with SelectionDAG::UpdateNodeOperands returning a different SDNode (#65340)"
This reverts commit 11171d81aeafb0c2818f288900423e366a2787fc.

Broke ASAN bot.
2023-09-06 13:16:55 -07:00
Jay Foad
11171d81ae
[AMDGPU] Cope with SelectionDAG::UpdateNodeOperands returning a different SDNode (#65340)
SITargetLowering::adjustWritemask calls SelectionDAG::UpdateNodeOperands
to update an EXTRACT_SUBREG node in-place to refer to a new IMAGE_LOAD
instruction, before we delete the old IMAGE_LOAD instruction. But in
UpdateNodeOperands can do CSE on the fly and return a different
EXTRACT_SUBREG node, so the original EXTRACT_SUBREG node would still
exist and would refer to the old deleted IMAGE_LOAD instruction. This
caused errors like:

t31: v3i32,ch = <<Deleted Node!>> # D:1
This target-independent node should have been selected!
UNREACHABLE executed at lib/CodeGen/SelectionDAG/InstrEmitter.cpp:1209!

Fix it by detecting the CSE case and replacing all uses of the original
EXTRACT_SUBREG node with the CSE'd one.
2023-09-06 12:51:44 +01:00
Pravin Jagtap
b230472f22
[AMDGPU] Extend v2i16 & v2f16 support for llvm.amdgcn.update.dpp intr (#65318)
Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2023-09-06 10:20:34 +05:30
Amara Emerson
6c31f20fee
[GlobalISel] Fold fmul x, 1.0 -> x (#65379) 2023-09-06 03:14:16 +08:00
Amara Emerson
08e04209d8
[GlobalISel] Commute G_FMUL and G_FADD constant LHS to RHS. (#65298) 2023-09-05 23:48:34 +08:00
Nicolai Hähnle
62790a8d4a AMDGPU: Fix test from previous commit 2023-09-05 00:31:49 +02:00
Nicolai Hähnle
f5fb6ad2e5 AMDGPU: Precommit a test file
Demonstrates bad scheduling for private load/store vs. buffer
intrinsics.
2023-09-05 00:17:46 +02:00
Jay Foad
71ca53b6cf
[GlobalISel] Lower G_SHUFFLE_VECTOR with scalar result (#65275) 2023-09-04 13:32:43 -04:00