8304 Commits

Author SHA1 Message Date
David Stuttard
ce031fc17a
[AMDGPU] Fix non-deterministic iteration order in SIFixSGPRCopies (#66617)
Use of DenseSet was causing some non-deteminism in SIFixSGPRSopies. Changing
to SetVector fixes the problem.
2023-09-18 10:08:53 +01:00
Matt Arsenault
1f15e39d81 AMDGPU/GlobalISel: Don't pointlessly check for convergent intrinsics
The set of handled intrinsics for fneg combines aren't convergent. The only
case we might want to handle is mov_dpp.
2023-09-15 23:32:19 +03:00
Jay Foad
fcbdcb13ce
[AMDGPU] Tweak tuple weight calculation. NFC. (#66490)
This just makes it more obvious that GCNRegPressure does not actually
use pressure sets.
2023-09-15 16:30:06 +01:00
Pierre van Houtryve
e9e3868707
[AMDGPU] Correctly restore FP mode in FDIV32 lowering (#66346)
Addresses the FIXME for both DAGISel and GISel.
2023-09-15 08:11:01 +02:00
Arthur Eubanks
0a1aa6cda2
[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295)
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.

This matches other nearby enums.

For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
2023-09-14 14:10:14 -07:00
Pierre van Houtryve
3d0353793b
[AMDGPU] Fix HasFP32Denormals check in FDIV32 lowering (#66212)
Fixes SWDEV-403219
2023-09-14 08:47:10 +02:00
Simon Pilgrim
47a9cd0343 [AMDGPU] Remove constexpr from getNumUserSGPRForField/getMaxNumPreloadedSGPRs to appease older gcc builds
Older versions of gcc wouldn't accept the constexpr getNumUserSGPRForField (introduced in D159439 / 343be5132e2831d85) as it couldn't treat the llvm_unreachable call as constexpr
2023-09-13 12:19:28 +01:00
Matt Arsenault
231aa0f212 AMDGPU: Avoid creating vector extracts if we aren't going to do anything
Try to avoid expensive checks failures from reporting no changes
when some dead instructions were introduced.
2023-09-13 09:45:34 +03:00
Matt Arsenault
edecb60481 Reapply "AMDGPU: Drop and auto-upgrade llvm.amdgcn.ldexp to llvm.ldexp"
This reverts commit d9333e360a7c52587ab6e4328e7493b357fb2cf3.
2023-09-13 08:38:48 +03:00
Pravin Jagtap
3755ea93b4
[AMDGPU] Fix scan of atomicFSub in AtomicOptimizer. (#66082)
[D156301](https://reviews.llvm.org/D156301) introduced atomic
optimizations for FAdd/FSub. For FSub, reduction/scan needs to be
performed using add operation (`not sub`) and memory location will be
updated by reduced value using atomic sub later by only one lane.

---------

Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2023-09-13 09:57:10 +05:30
Jeffrey Byrnes
db47264ab3
Revert "[AMDGPU]: Allow combining into v_dot4" (#66158)
This reverts commit 7fda1b74be4a173031192d8516869e87e6b7582d.
2023-09-12 16:57:17 -07:00
Kazu Hirata
a9c7ba964f [AMDGPU] Fix a warning
This patch fixes:

  llvm/lib/Target/AMDGPU/AMDGPU.h:297:18: error: private field 'TM' is
  not used [-Werror,-Wunused-private-field]
2023-09-12 14:02:07 -07:00
jwanggit86
b853988e0d
[AMDGPU] Port AMDGPURewriteUndefForPHI to new pass manager (#66008)
This patch ports the AMDGPURewriteUndefForPHI pass to the new pass
manager. With this, the pass is supported under both the legacy and the
new pass managers.

---------

Co-authored-by: Jun Wang <jun.wang7@amd.com>
2023-09-12 13:32:02 -07:00
Matt Arsenault
c48248d7f9 AMDGPU: Teach valueIsKnownNeverF32Denorm about frexp
https://reviews.llvm.org/D158130
2023-09-12 23:23:10 +03:00
Matt Arsenault
72a7024add AMDGPU: Correctly lower llvm.sqrt.f32
Make codegen emit correctly rounded sqrt by default.

Emit the fast but only kind of fast expansion in AMDGPUCodeGenPrepare
based on !fpmath, like the fdiv case. Hack around visitation ordering
problems from AMDGPUCodeGenPrepare using forward iteration instead of
a well behaved combiner.

https://reviews.llvm.org/D158129
2023-09-12 23:22:54 +03:00
Kazu Hirata
0bb49afeaf [AMDGPU] Fix an unused variable warning
This patch fixes:

  llvm/lib/Target/AMDGPU/SIISelLowering.cpp:2493:33: error: unused
  variable 'UserSGPRInfo' [-Werror,-Wunused-variable]
2023-09-12 12:06:36 -07:00
Austin Kerbow
343be5132e [AMDGPU] Add utilities to track number of user SGPRs. NFC.
Factor out and unify some common code that calculates and tracks the
number of user SGRPs.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D159439
2023-09-12 08:52:30 -07:00
Saiyedul Islam
466a8149b3
Revert "[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410)" (#66060)
This reverts commit 0a8d17e79b02a92814a2a788d79df1f54d70ec3e.
2023-09-12 15:13:59 +05:30
Saiyedul Islam
0a8d17e79b
[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410)
Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata

Reviewed By: arsenm, jhuber6

Github PR: #65410

Differential Revision: https://reviews.llvm.org/D129818
2023-09-12 13:53:31 +05:30
Jeremy Morse
e54277fa10 [NFC][RemoveDIs] Use iterators over inst-pointers when using IRBuilder
This patch adds a two-argument SetInsertPoint method to IRBuilder that
takes a block/iterator instead of an instruction, and updates many call
sites to use it. The motivating reason for doing this is given here [0],
we'd like to pass around more information about the position of debug-info
in the iterator object. That necessitates passing iterators around most of
the time.

[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939

Differential Revision: https://reviews.llvm.org/D152468
2023-09-11 20:01:19 +01:00
Stanislav Mekhanoshin
070c2570ad
[AMDGPU] Global ISel for packed fp32 instructions (#65803) 2023-09-11 10:48:37 -07:00
Jeremy Morse
6942c64e81 [NFC][RemoveDIs] Prefer iterator-insertion over instructions
Continuing the patch series to get rid of debug intrinsics [0], instruction
insertion needs to be done with iterators rather than instruction pointers,
so that we can communicate information in the iterator class. This patch
adds an iterator-taking insertBefore method and converts various call sites
to take iterators. These are all sites where such debug-info needs to be
preserved so that a stage2 clang can be built identically; it's likely that
many more will need to be changed in the future.

At this stage, this is just changing the spelling of a few operations,
which will eventually become signifiant once the debug-info bearing
iterator is used.

[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939

Differential Revision: https://reviews.llvm.org/D152537
2023-09-11 11:48:45 +01:00
Piotr Sobczak
8fdf61a3bd
[AMDGPU][NFCI] Refactor BUFInstructions.td (#65746)
Make the code more consistent:
- More pattern classes follow the simpler interface with string instead
of pseudos.
- CmpSwap patterns are encapsulated in SIBufferAtomicCmpSwapPat.
- Pseudo store patterns are separated out, similarly to the load
counterparts.
- MUBUF_Offset_Load_Pat is now GCNPat, as others.
2023-09-11 08:04:22 +02:00
Carl Ritson
1d8a94c4ff [AMDGPU] SILowerControlFlow: fix preservation of LiveIntervals
In emitElse live interval for SI_ELSE source must be recalculated
as SI_ELSE is removed, and new user is placed at block start.
In emitIfBreak live interval for new created AndReg must be
computed.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D158141
2023-09-11 13:46:28 +09:00
Carl Ritson
46ee3b3914
[AMDGPU] SILowerI1Copies: clear kill flags on COPY (#65883)
Clear kill flags on COPY source as it will be reused.
2023-09-11 12:30:08 +09:00
Matt Arsenault
17bd80601e AMDGPU: Implement llvm.get.fpmode
Currently s_getreg_b32 is missing the possible mode use. Really we
need separate pseudos for mode-only accesses, but leave this as a
pre-existing issue.

https://reviews.llvm.org/D152710
2023-09-10 10:19:19 +03:00
jwanggit86
37b538819b
[AMDGPU] Incorrect error message regarding SCC modifier (#65660)
For the AMD GFX90A GPU, the SCC instruction modifier is allowed for
certain classes of instructions. However, the current assembler
generates an error message, "scc is not supported on this GPU",
regardless of the instruciton. This fix modifies the message as well as
the logic for generating the message. Related tests are moved from
gfx90a_err.s to gfx90a_asm_features.s.

Co-authored-by: Jun Wang <jun.wang7@amd.com>
2023-09-08 13:07:06 -07:00
David Stuttard
8c03239934
[AMDGPU] New intrinsic void llvm.amdgcn.s.nop(i16) (#65757)
This allows front ends to insert s_nops - this is most often when a
delay less
than s_sleep 1 is required.
2023-09-08 16:24:10 +01:00
Jay Foad
8669a9f93a
[AMDGPU] Cope with SelectionDAG::UpdateNodeOperands returning a different SDNode (#65765)
SITargetLowering::adjustWritemask calls SelectionDAG::UpdateNodeOperands
to update an EXTRACT_SUBREG node in-place to refer to a new IMAGE_LOAD
instruction, before we delete the old IMAGE_LOAD instruction. But in
UpdateNodeOperands can do CSE on the fly and return a different
EXTRACT_SUBREG node, so the original EXTRACT_SUBREG node would still
exist and would refer to the old deleted IMAGE_LOAD instruction. This
caused errors like:

t31: v3i32,ch = <<Deleted Node!>> # D:1
This target-independent node should have been selected!
UNREACHABLE executed at lib/CodeGen/SelectionDAG/InstrEmitter.cpp:1209!

Fix it by detecting the CSE case and replacing all uses of the original
EXTRACT_SUBREG node with the CSE'd one.

Recommit with a fix for a use-after-free bug in the first version of
this patch (#65340) which was caught by asan.
2023-09-08 16:16:02 +01:00
Jay Foad
dd5af895bb
[AMDGPU] Mark S_NOP as having side effects (#65745)
This prevents S_NOP from being rescheduled past other (side-effecting)
instructions, which is useful because it is generally used to introduce
a short delay or to avoid hazards. Currently this only affects MIR tests
because the compiler itself only inserts nops in PostRAHazardRecognizer
which runs after all scheduling.
2023-09-08 14:05:56 +01:00
Nicolai Hähnle
2eb767c9e1
AMDGPU: Scratch instructions are trivially disjoint from SMEM and buffer instructions (#65287)
Scratch instructions are always in addrspace(5), which can only alias
with flat (and itself). SMEM and buffer instructions can never reference
those address spaces, so they are trivially disjoint.
2023-09-08 07:43:36 +02:00
Jeffrey Byrnes
5044531afd
[AMDGPU] Teach CalculateByteProvider about AMDGPUISD::PERM (#65547)
As a standalone patch, it has limited effect. However, it is necessary
as it supports upcoming commits.
2023-09-07 15:13:42 -07:00
Jeffrey Byrnes
7fda1b74be [AMDGPU]: Allow combining into v_dot4
Differential Revision: https://reviews.llvm.org/D155995

Change-Id: I794f540217f0f84141338757b41b1be0493c7207
2023-09-07 12:58:48 -07:00
Matt Arsenault
4fcc21bbce AMDGPU: Remove unused node definition 2023-09-07 20:30:14 +03:00
Pierre van Houtryve
30955c9d22
[AMDGPU] Fix V_MOV_B32_indirect inst size (#65584)
This inst lowers to a normal v_mov_b32 so it's not zero-sized, but has a
size of 4.

Solves SWDEV-416337
2023-09-07 13:12:58 +02:00
Stanislav Mekhanoshin
0dd4d3b5cc
[AMDGPU] Remove predicate on real packed fp32 instructions (#65589)
It is copied from the pseudo anyway.
2023-09-07 03:17:25 -07:00
Florian Mayer
42a1d16179 Revert "[AMDGPU] Cope with SelectionDAG::UpdateNodeOperands returning a different SDNode (#65340)"
This reverts commit 11171d81aeafb0c2818f288900423e366a2787fc.

Broke ASAN bot.
2023-09-06 13:16:55 -07:00
Jay Foad
11171d81ae
[AMDGPU] Cope with SelectionDAG::UpdateNodeOperands returning a different SDNode (#65340)
SITargetLowering::adjustWritemask calls SelectionDAG::UpdateNodeOperands
to update an EXTRACT_SUBREG node in-place to refer to a new IMAGE_LOAD
instruction, before we delete the old IMAGE_LOAD instruction. But in
UpdateNodeOperands can do CSE on the fly and return a different
EXTRACT_SUBREG node, so the original EXTRACT_SUBREG node would still
exist and would refer to the old deleted IMAGE_LOAD instruction. This
caused errors like:

t31: v3i32,ch = <<Deleted Node!>> # D:1
This target-independent node should have been selected!
UNREACHABLE executed at lib/CodeGen/SelectionDAG/InstrEmitter.cpp:1209!

Fix it by detecting the CSE case and replacing all uses of the original
EXTRACT_SUBREG node with the CSE'd one.
2023-09-06 12:51:44 +01:00
Sergei Barannikov
a479be0f39 [MC] Change tryParseRegister to return ParseStatus (NFC)
This finishes the work of replacing OperandMatchResultTy with
ParseStatus, started in D154101.
As a drive-by change, rename some RegNo variables to just Reg
(a leftover from the days when RegNo had 'unsigned' type).
2023-09-06 10:28:12 +03:00
Pravin Jagtap
b230472f22
[AMDGPU] Extend v2i16 & v2f16 support for llvm.amdgcn.update.dpp intr (#65318)
Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2023-09-06 10:20:34 +05:30
pvanhout
4e513f69a1 [GlobalISel] Cleanup Combine.td
Now that the old backend is gone, clean-up a few things that no longer make sense and tidy up the file a bit.

Depends on D158710

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D158714
2023-09-05 08:19:06 +02:00
pvanhout
aaf6755631 [GlobalISel] Refactor Combiner API
Remove CodeGen leftovers from the old combiner backend and adapt the API to fit the new backend better.
It's now quite a bit closer to how InstructionSelector works.

- `CombinerInfo` is now a simple "options" struct.
- `Combiner` is now the base class of all TableGen'd combiner implementation.
    - Many fields have been moved from derived classes into that class.
    - It has been refactored to create & own the Observer and Builder.
- `tryCombineAll` TableGen'd method can now be renamed, which allows targets to implement the actual `tryCombineAll` call manually and do whatever they want to do before/after it.

Note: `CombinerHelper` needs to be mutable because none of its methods are const. This can be revisited later.

Depends on D158710

Reviewed By: aemerson, dsanders

Differential Revision: https://reviews.llvm.org/D158713
2023-09-05 08:19:05 +02:00
TY-AMD
b1b6c06567
[AMDGPU] Erase ShaderFunctions in AMDGPUPALMetadata::reset() (#65247) 2023-09-04 08:03:01 -04:00
Matt Arsenault
77c67436d9 LLT: Add some stub constructors for FP types
This is to start documenting uses to ease a future migration
to supporting different types with the same size.

https://reviews.llvm.org/D150605
2023-09-03 08:33:19 -04:00
Matt Arsenault
f7dcabe502 AMDGPU: Pass in TargetMachine to AMDGPULowerModuleLDSPass
https://reviews.llvm.org/D157660
2023-09-02 12:02:36 -04:00
Matt Arsenault
1f52060000 AMDGPU: Use poison instead of undef in module lds pass 2023-09-02 11:33:26 -04:00
Matt Arsenault
ee795fd1cf AMDGPU: Handle rounding intrinsic exponents in isKnownIntegral
https://reviews.llvm.org/D158999
2023-09-01 08:22:16 -04:00
Matt Arsenault
def228553c AMDGPU: Use pown instead of pow if known integral
https://reviews.llvm.org/D158998
2023-09-01 08:22:16 -04:00
Matt Arsenault
deefda7074 AMDGPU: Use exp2 and log2 intrinsics directly for f16/f32
These codegen correctly but f64 doesn't. This prevents losing fast
math flags on the way to the underlying intrinsic.

https://reviews.llvm.org/D158997
2023-09-01 08:22:16 -04:00
Matt Arsenault
dac8f974b5 AMDGPU: Handle sitofp and uitofp exponents in fast pow expansion
https://reviews.llvm.org/D158996
2023-09-01 08:22:16 -04:00