686 Commits

Author SHA1 Message Date
Matt Arsenault
a757c4e74e
CodeGen: Add subtarget to TargetLoweringBase constructor (#168620)
Currently LibcallLoweringInfo is defined inside of TargetLowering,
which is owned by the subtarget. Pass in the subtarget so we can
construct LibcallLoweringInfo with the subtarget. This is a temporary
step that should be revertable in the future, after LibcallLoweringInfo
is moved out of TargetLowering.
2025-11-19 19:18:13 +00:00
Sergei Barannikov
900c517919
[AMDGPU] TableGen-erate SDNode descriptions (#168248)
This allows SDNodes to be validated against their expected type profiles
and reduces the number of changes required to add a new node.

Autogenerated node names start with "AMDGPUISD::", hence the changes in
the tests.

The few nodes defined in R600.td are *not* imported because TableGen
processes AMDGPU.td that doesn't include R600.td. Ideally, we would have
two sets of nodes, but that would require careful reorganization of td
files since some nodes are shared between AMDGPU/R600. Not sure if it
something worth looking into.

Some nodes fail validation, those are listed in
`AMDGPUSelectionDAGInfo::verifyTargetNode()`.

Part of #119709.

Pull Request: https://github.com/llvm/llvm-project/pull/168248
2025-11-17 00:10:07 +00:00
Jay Foad
72c69aefba
[AMDGPU] Make use of getFunction and getMF. NFC. (#167872) 2025-11-14 11:00:57 +00:00
Sergei Barannikov
71927ddb63
[CodeGen] Delete two ComputeValueVTs overloads (NFC) (#166758)
Those have only a few uses.
2025-11-06 19:45:29 +03:00
LU-JOHN
f899893c19
[AMDGPU][NFC] Cleanly make 32-bit abs legal (#164837)
Cleanly make 32-bit abs legal only in SIISelLowering.cpp

Signed-off-by: John Lu <John.Lu@amd.com>
2025-10-23 15:21:30 -05:00
LU-JOHN
f7c9618d57
[AMDGPU] 32-bit ABS is a legal DAG node (#163907)
32-bit ABS can be lowered legally.

---------

Signed-off-by: John Lu <John.Lu@amd.com>
2025-10-17 16:11:59 -05:00
paperchalice
14a42e64cf
[AMDGPU] Remove NoInfsFPMath uses (#163028)
Only `ninf` should be used.
2025-10-13 19:15:49 +08:00
Janek van Oirschot
a584bd98a1
Revert "[AMDGPU] Elide bitcast fold i64 imm to build_vector" (#160325)
Reverts llvm/llvm-project#154115

Co-authored-by: ronlieb <ron.lieberman@amd.com>
2025-09-23 17:36:02 +01:00
Janek van Oirschot
341cdbc970
[AMDGPU] Elide bitcast fold i64 imm to build_vector (#154115)
Elide bitcast combine to build_vector in case of i64 immediate that can
be materialized through 64b mov
2025-09-16 16:44:51 +01:00
Chris Jackson
5e6564b098
[AMDGPU][SDAG] Legalise v2i32 or/xor/and instructions to make use of 64-bit wide instructions (#140694)
- Enable s_or_b64/s_and_b64/s_xor_b64 for v2i32. Add various additional
combines to make use of these newly legalised instructions.
- Update several tests and separate legacy r600 tests where necessary.
2025-09-11 13:32:44 +01:00
Diana Picus
018dc1b397
[AMDGPU] Tail call support for whole wave functions (#145860)
Support tail calls to whole wave functions (trivial) and from whole wave
functions (slightly more involved because we need a new pseudo for the
tail call return, that patches up the EXEC mask).

Move the expansion of whole wave function return pseudos (regular and
tail call returns) to prolog epilog insertion, since that's where we
patch up the EXEC mask.
2025-09-04 10:34:43 +02:00
Frederik Harwath
47793f9a73
[AMDGPU] Implement IR expansion for frem instruction (#130988)
This patch implements a correctly rounded expansion of the frem
instruction in LLVM IR. This is useful for target architectures for
which such an expansion is too involved to be implement in ISel
Lowering. The expansion is based on the code from the AMD device libs
and has been tested successfully against the OpenCL conformance tests on
amdgpu. The expansion is implemented in the preexisting "expand-fp"
pass. It replaces the expansion of "frem" in ISel for the amdgpu target;
it is enabled for targets which do not directly support "frem" and for
which no matching "fmod" LibCall is available.

---------

Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
2025-09-03 16:27:15 +02:00
Adam Nemet
9b5502292d
[CG] Add VTs for v[567]i1 and v[567]f16 (#156523)
[recommit https://github.com/llvm/llvm-project/pull/151763 after fixing
https://github.com/llvm/llvm-project/issues/152150]

We already had corresponding f32 and i32 vector types for these sizes.

Also add VTs v[567]i8 and v[567]i16: these are needed by the Hexagon
backend which for each i1 vector types want to query information about
the corresponding i8 and i16 types in
HexagonTargetLowering::getPreferredHvxVectorAction.
2025-09-02 17:40:43 -07:00
paperchalice
595573d1ed
[AMDGPU] Remove ApproxFuncFPMath uses (#155578)
One of options in `resetTargetOptions`, this removes `ApproxFuncFPMath`
in AMDGPU part.
2025-08-28 11:09:01 +08:00
Simon Pilgrim
2a79ef66eb
[AMDGPU] canCreateUndefOrPoisonForTargetNode - BFE_I32/U32 can't create poison/undef (#154932)
Add AMDGPUTargetLowering::canCreateUndefOrPoisonForTargetNode handler
and tag BFE_I32/U32 nodes as they can only propagate poison, not create
poison/undef.

Fighting some of the remaining regressions in #152107
2025-08-22 12:14:45 +00:00
Matt Arsenault
1f0af171cd
AMDGPU: Fix using illegal extract_subvector indexes (#154098) 2025-08-20 00:35:09 +00:00
Gang Chen
ef68d1587d
[AMDGPU] upstream barrier count reporting part1 (#154409) 2025-08-19 16:42:31 -07:00
Stanislav Mekhanoshin
e7c2c80fa1
[AMDGPU] Combine prng(undef) -> undef (#154160) 2025-08-18 12:13:16 -07:00
Adam Nemet
124722bfe5
Revert "[CG] Add VTs for v[567]i1 and v[567]f16" (#152217)
Reverts llvm/llvm-project#151763

It caused: https://github.com/llvm/llvm-project/issues/152150
2025-08-05 16:47:50 -07:00
Paul Walker
94d374ab6c
[LLVM][CGP] Allow finer control for sinking compares. (#151366)
Compare sinking is selectable based on the result of
hasMultipleConditionRegisters. This function is too coarse grained by
not taking into account the differences between scalar and vector
compares. This PR extends the interface to take an EVT to allow finer
control.
    
The new interface is used by AArch64 to disable sinking of scalable
vector compares, but with isProfitableToSinkOperands updated to maintain
the cases that are specifically tested.
2025-08-05 11:43:41 +01:00
Adam Nemet
300e41d72f
[CG] Add VTs for v[567]i1 and v[567]f16 (#151763)
We already had corresponding f32 and i32 vector types for these sizes.

Also add VTs v[567]i8 and v[567]i16:  these are needed by the Hexagon
backend which for each i1 vector types want to query information about
the corresponding i8 and i16 types in
HexagonTargetLowering::getPreferredHvxVectorAction.
2025-08-02 09:00:31 -07:00
paperchalice
8bacfb2538
[AMDGPU] Remove UnsafeFPMath uses (#151079)
Remove `UnsafeFPMath` in AMDGPU part, it blocks some bugfixes related to
clang and the ultimate goal is to remove `resetTargetOptions` method in
`TargetMachine`, see FIXME in `resetTargetOptions`.
See also
https://discourse.llvm.org/t/rfc-honor-pragmas-with-ffp-contract-fast

https://discourse.llvm.org/t/allowfpopfusion-vs-sdnodeflags-hasallowcontract

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-07-31 17:36:57 +08:00
Daniil Fukalov
e650c4b9ef
[NFC][AMDGPU] Move cmp+select arguments optimization to SIISelLowering. (#150929)
As requested in #148740.
2025-07-28 22:11:36 +02:00
Nikita Popov
fe0dbe0f29
[CodeGen] More consistently expand float ops by default (#150597)
These float operations were expanded for scalar f32/f64/f128, but not
for f16 and more problematically, not for vectors. A small subset of
them was separately set to expand for vectors.

Change these to always expand by default, and adjust targets to mark
these as legal where necessary instead.

This is a much safer default, and avoids unnecessary legalization
failures because a target failed to manually mark them as expand.

Fixes https://github.com/llvm/llvm-project/issues/110753.
Fixes https://github.com/llvm/llvm-project/issues/121390.
2025-07-28 09:46:00 +02:00
Jay Foad
28b85502eb
[AMDGPU] Remove some duplicated lines. NFC. (#128029) 2025-07-21 17:28:31 +01:00
Diana Picus
20d8398825
[AMDGPU] ISel & PEI for whole wave functions (#145858)
Whole wave functions are functions that will run with a full EXEC mask.
They will not be invoked directly, but instead will be launched by way
of a new intrinsic, `llvm.amdgcn.call.whole.wave` (to be added in
a future patch). These functions are meant as an alternative to the
`llvm.amdgcn.init.whole.wave` or `llvm.amdgcn.strict.wwm` intrinsics.

Whole wave functions will set EXEC to -1 in the prologue and restore the
original value of EXEC in the epilogue. They must have a special first
argument, `i1 %active`, that is going to be mapped to EXEC. They may
have either the default calling convention or amdgpu_gfx. The inactive
lanes need to be preserved for all registers used, active lanes only for
the CSRs.

At the IR level, arguments to a whole wave function (other than
`%active`) contain poison in their inactive lanes. Likewise, the return
value for the inactive lanes is poison.

This patch contains the following work:
* 2 new pseudos, SI_SETUP_WHOLE_WAVE_FUNC and SI_WHOLE_WAVE_FUNC_RETURN
  used for managing the EXEC mask. SI_SETUP_WHOLE_WAVE_FUNC will return
  a SReg_1 representing `%active`, which needs to be passed into
  SI_WHOLE_WAVE_FUNC_RETURN.
* SelectionDAG support for generating these 2 new pseudos and the
  special handling of %active. Since the return may be in a different
  basic block, it's difficult to add the virtual reg for %active to
  SI_WHOLE_WAVE_FUNC_RETURN, so we initially generate an IMPLICIT_DEF
  which is later replaced via a custom inserter.
* Expansion of the 2 pseudos during prolog/epilog insertion. PEI also
  marks any used VGPRs as WWM registers, which are then spilled and
  restored with the usual logic.

Future patches will include the `llvm.amdgcn.call.whole.wave` intrinsic
and a lot of optimization work (especially in order to reduce spills
around function calls).

---------

Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
Co-authored-by: Shilei Tian <i@tianshilei.me>
2025-07-21 10:39:09 +02:00
Daniil Fukalov
b7f6abdd05
[AMDGPU] Try to reuse register with the constant from compare in v_cndmask (#148740)
For some targets, the optimization X == Const ? X : Y -> X == Const ?
Const : Y can cause extra register usage or redundant immediate encoding
for the constant in cndmask generated from the ternary operation.

This patch detects such cases and reuses the register from the compare
instruction that already holds the constant, instead of materializing it
again for cndmask.

The optimization avoids immediates that can be encoded into cndmask
instruction (including +-0.0), as well as !isNormal() constants.

The change is reworked on the base of #131146

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-07-16 23:18:44 +02:00
Stanislav Mekhanoshin
2d6534b7da
[AMDGPU] gfx1250 64-bit relocations and fixups (#148951) 2025-07-15 17:13:42 -07:00
Shilei Tian
d7ec80c897
[AMDGPU] Add support for v_tanh_bf16 on gfx1250 (#147425)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-14 16:30:18 -04:00
LU-JOHN
896d900c9b
[AMDGPU] Create hi-half of 64-bit ashr with mov of -1 (#146569)
When performing a 64-bit sra of a negative value with a shift range from
[32-63], create the hi-half with a move of -1.

Alive verification: https://alive2.llvm.org/ce/z/kXd7Ac

Also, preserve exact flag. Alive verification:
https://alive2.llvm.org/ce/z/L86tXf.

---------

Signed-off-by: John Lu <John.Lu@amd.com>
2025-07-09 12:02:51 -04:00
Dominik Steenken
acdf1c7526
[DAG] Add generic expansion for ISD::FCANONICALIZE nodes (#142105)
This PR takes the work previously done by @pawan-nirpal-031 on X86 in
#106370, and makes it available in common code. This should enable all
targets to use `__builtin_canonicalize` for all `f(16|32|64|128)` data
types.

Canonicalization is implemented here as multiplication by `1.0`, as
suggested in [the
docs](https://llvm.org/docs/LangRef.html#llvm-canonicalize-intrinsic).
2025-07-08 16:12:17 +01:00
LU-JOHN
543f948312
[AMDGPU] Preserve exact flag for lshr (#146744)
When reducing 64-bit lshr to 32-bit preserve exact flag.

Alive2 verification:  https://alive2.llvm.org/ce/z/LcnX7V

---------

Signed-off-by: John Lu <John.Lu@amd.com>
2025-07-07 23:26:45 +09:00
Fabian Ritter
fa859ed8c6
[AMDGPU][NFC] Fix typo "store" -> "load" in comment for AMDGPUTLI::performLoadCombine (#147298) 2025-07-07 16:11:45 +02:00
LU-JOHN
f2991bf791
[AMDGPU] Convert 64-bit sra to 32-bit if shift amt >= 32 (#144421)
Use KnownBits to convert 64-bit sra to 32-bit sra.

Scaled-down alive2 verification with 16/8-bit types:
https://alive2.llvm.org/ce/z/LamASk

---------

Signed-off-by: John Lu <John.Lu@amd.com>
2025-06-26 14:22:59 -04:00
Matt Arsenault
4be4b82e74
AMDGPU: Use reportFatalUsageError for unhandled calling conventions (#145261)
Should switch this to DiagnosticInfo and use the default calling
convention, but that would require passing in the context.
2025-06-23 15:30:13 +09:00
Matt Arsenault
dd4776d429
AMDGPU: Remove AMDGPUInstrInfo class (#144984)
This was never constructed and only provided one static helper
function.
2025-06-20 18:26:56 +09:00
LU-JOHN
c4caf00bfb
[AMDGPU] Convert more 64-bit lshr to 32-bit if shift amt>=32 (#138204)
Convert vector 64-bit lshr to 32-bit if shift amt is known to be >= 32.
Also convert scalar 64-bit lshr to 32-bit if shift amt is variable but
known to be >=32.

---------

Signed-off-by: John Lu <John.Lu@amd.com>
2025-06-13 17:03:06 +09:00
LU-JOHN
f88a9a32d9
[AMDGPU] Extend SRA i64 simplification for shift amts in range [33:62] (#138913)
Extend sra i64 simplification to shift constants in range [33:62]. Shift
amounts 32 and 63 were already handled.
New testing for shift amts 33 and 62 added in sra.ll. Changes to other
test files were to adapt previous test results to this extension.

---------

Signed-off-by: John Lu <John.Lu@amd.com>
2025-05-30 16:21:38 +02:00
Justin Bogner
b7bb256703
Warn on misuse of DiagnosticInfo classes that hold Twines (#137397)
This annotates the `Twine` passed to the constructors of the various
DiagnosticInfo subclasses with `[[clang::lifetimebound]]`, which causes
us to warn when we would try to print the twine after it had already
been destructed.

We also update `DiagnosticInfoUnsupported` to hold a `const Twine &`
like all of the other DiagnosticInfo classes, since this warning allows
us to clean up all of the places where it was being used incorrectly.
2025-05-28 12:26:39 -07:00
Pierre van Houtryve
fd85ffb4c4
[AMDGPU] Handle min/max in isNarrowingProfitable (#140206)
Introduces a slight regression in some cases but it'll even out once we
disable the promotion in CGP.
2025-05-16 10:16:44 +02:00
Matt Arsenault
797a580b6a
AMDGPU: Use poison instead of undef in more lowerings (#139208) 2025-05-09 18:18:11 +02:00
Matt Arsenault
912df60b08
AMDGPU: Handle minimumnum/maximumnum in fneg combines (#139133) 2025-05-09 08:07:01 +02:00
Brox Chen
9d907a2bb1
AMDGPU][True16][CodeGen] FP_Round f64 to f16 in true16 (#128911)
Update the f64 to f16 lowering for targets which support f16 types. 

For unsafe mode, lowered to two FP_ROUND. (This patch
https://reviews.llvm.org/D154528 stops from combining these two FP_ROUND
back). In safe mode, select LowerF64ToF16 (round-to-nearest-even
rounding mode)
2025-05-08 13:30:09 -04:00
Matt Arsenault
003e501487
AMDGPU: Fix gcc -Wenum-compare warning (#138529) 2025-05-05 16:26:06 +02:00
Simon Pilgrim
a99e055030
[DAG] shouldReduceLoadWidth - add optional<unsigned> byte offset argument (#136723)
Based off feedback for #129695 - we need to be able to determine the
load offset of smaller loads when trying to determine whether a multiple
use load should be split (in particular for AVX subvector extractions).

This patch adds a std::optional<unsigned> ByteOffset argument to
shouldReduceLoadWidth calls for where we know the constant offset to
allow targets to make use of it in future patches.
2025-04-23 12:30:27 +01:00
Simon Pilgrim
64ffecfc43
[DAG] isKnownNeverNaN - add DemandedElts element mask to isKnownNeverNaN calls (#135952)
Matches what we've done for computeKnownBits etc. to improve vector handling
2025-04-18 09:24:02 +01:00
Juan Manuel Martinez Caamaño
0375ef07c3
[Clang][AMDGPU] Add __builtin_amdgcn_cvt_off_f32_i4 (#133741)
This built-in maps to `V_CVT_OFF_F32_I4` which treats its input as
a 4-bit signed integer and returns `0.0625f * src`.

SWDEV-518861
2025-04-02 19:51:40 +02:00
Tim Gymnich
1d0005a69a
[GlobalISel][NFC] Rename GISelKnownBits to GISelValueTracking (#133466)
- rename `GISelKnownBits` to `GISelValueTracking` to analyze more than
just `KnownBits` in the future
2025-03-29 11:51:29 +01:00
LU-JOHN
827f2ad643
AMDGPU: Convert vector 64-bit shl to 32-bit if shift amt >= 32 (#132964)
Convert vector 64-bit shl to 32-bit if shift amt is known to be >= 32.

---------

Signed-off-by: John Lu <John.Lu@amd.com>
2025-03-28 23:46:35 +07:00
Diana Picus
e17b3cdfb3
[AMDGPU] Dynamic VGPR support for llvm.amdgcn.cs.chain (#130094)
The llvm.amdgcn.cs.chain intrinsic has a 'flags' operand which may
indicate that we want to reallocate the VGPRs before performing the
call.

A call with the following arguments:
```
llvm.amdgcn.cs.chain %callee, %exec, %sgpr_args, %vgpr_args,
  /*flags*/0x1, %num_vgprs, %fallback_exec, %fallback_callee
```
is supposed to do the following:
- copy the SGPR and VGPR args into their respective registers
- try to change the VGPR allocation
- if the allocation has succeeded, set EXEC to %exec and jump to
%callee, otherwise set EXEC to %fallback_exec and jump to
%fallback_callee

This patch implements the dynamic VGPR behaviour by generating an
S_ALLOC_VGPR followed by S_CSELECT_B32/64 instructions for the EXEC and
callee. The rest of the call sequence is left undisturbed (i.e.
identical to the case where the flags are 0 and we don't use dynamic
VGPRs). We achieve this by introducing some new pseudos
(SI_CS_CHAIN_TC_Wn_DVGPR) which are expanded in the SILateBranchLowering
pass, just like the simpler SI_CS_CHAIN_TC_Wn pseudos. The main reason
is so that we don't risk other passes (particularly the PostRA
scheduler) introducing instructions between the S_ALLOC_VGPR and the
jump. Such instructions might end up using VGPRs that have been
deallocated, or the wrong EXEC mask. Once the whole backend treats
S_ALLOC_VGPR and changes to EXEC as barriers for instructions that use
VGPRs, we could in principle move the expansion earlier (but in the
absence of a good reason for that my personal preference is to keep it
later in order to make debugging easier).

Since the expansion happens after register allocation, we're careful to
select constants to immediate operands instead of letting ISel generate
S_MOVs which could interfere with register allocation (i.e. make it look
like we need more registers than we actually do).

For GFX12, S_ALLOC_VGPR only works in wave32 mode, so we bail out during
ISel in wave64 mode. However, we can define the pseudos for wave64 too
so it's easy to handle if future generations support it.

---------

Co-authored-by: Ana Mihajlovic <Ana.Mihajlovic@amd.com>
Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
2025-03-20 08:38:04 +01:00