166 Commits

Author SHA1 Message Date
Matt Arsenault
f48425edca
AMDGPU: Match fract pattern with swapped edge case check (#189081)
A fract implementation can equivalently be written as
  r = fmin(x - floor(x))
  r = isnan(x) ? x : r;
  r = isinf(x) ? 0.0 : r;

or:
  r = fmin(x - floor(x));
  r = isinf(x) ? 0.0 : r;
  r = isnan(x) ? x : r;

Previously this only matched the previous form. Match
the case where the isinf check is the inner clamp. There are
a few more ways to write this pattern (e.g., move the clamp of
infinity to the input) but I haven't encountered that in the wild.

The existing code seems to be trying too hard to match noncanonical
variants of the pattern. Only handles the result that all 4 permutations
of compare and select produce out of instcombine.
2026-03-31 09:13:58 +02:00
Matt Arsenault
5688aca96e
AMDGPU: Simplify synthesis of nextdown(1.0) constant (#189039) 2026-03-27 16:42:01 +00:00
Matt Arsenault
dba3de54a2
AMDGPU: Allow poison vector elts in fract pattern (#188991) 2026-03-27 13:59:28 +00:00
Jay Foad
79d1a2c418
[AMDGPU] Standardize on using AMDGPU::getNullPointerValue. NFC. (#187037)
AMDGPUTargetMachine also had a static method which did the same thing.
Remove it so that we have a single source of truth.
2026-03-17 17:08:16 +00:00
Matt Arsenault
d72bc0903f
AMDGPU: Use fpmath metadata on f16 log/log10 intrinsics (#180489)
result by default, and the old expansion with the afn flag. The
old result was good enough for OpenCL conformance, so consider
the fpmath metadata and use the fast path. This is done in
AMDGPUCodeGenPrepare for the same reason that sqrt is handled here,
which is the DAG does not have a way to access fpmath metadata
from the original instruction.

This is not yet of practical use, because the log calls sourced
from OpenCL are not actually marked with this metadata and there
isn't a method to produce it from the source languages.
2026-02-17 09:58:43 +01:00
Matt Arsenault
80662c1de1
AMDGPU: Use SimplifyQuery in AMDGPUCodeGenPrepare (#179133)
Enables assumes in more contexts. Of particular interest is the
nan check for the fract pattern.

The device libs f32 and s64 sin implementations have a range check,
and inside the large path this pattern appears. After a small patch
to invert this check to send nans down the small path, this will
enable the fold unconditionally on the large path.
2026-02-02 09:33:16 +01:00
Teja Alaghari
b595849e7c
[AMDGPU]: Rewrite mbcnt_lo/mbcnt_hi to work item ID where applicable (#160496)
This PR aims to optimize `llvm.amdgcn.mbcnt.lo` and
`llvm.amdgcn.mbcnt.hi` intrinsic patterns into simpler `workitem.id.x`
operations when work group sizes are known at compile time for
**improving performance of lane ID calculations**.

**visitMbcntLo:**
- Simple replacement: When `workgroup_size == wave_size` → Replace with
`workitem.id.x`
- Bitmask optimization: When work group evenly splits into waves →
Replace with `workitem.id.x & (wave_size - 1)`

**visitMbcntHi:**
- Copy optimization: On wave32, `mbcnt.hi(mask, val)` → `val` (upper 32
bits are always 0)
- Full pattern optimization: `mbcnt.hi(~0, mbcnt.lo(~0, 0))` → Replace
with `workitem.id.x`


**Example 1: Simple Replacement**
```llvm
; Before:
%a = call i32 @llvm.amdgcn.mbcnt.lo(i32 -1, i32 0); With workgroup_size = 32
; After:
%a = call i32 @llvm.amdgcn.workitem.id.x()
```

**Example 2: Bitmask Optimization**
```llvm
; Before:
%a = call i32 @llvm.amdgcn.mbcnt.lo(i32 -1, i32 0); With workgroup_size = 64 (2 waves)
; After:
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%a = and i32 %tid, 31
```

**Example 3: Copy Optimization**
```llvm
; Before:
%a = call i32 @llvm.amdgcn.mbcnt.hi(i32 -1, i32 %val); With workgroup_size = 32
ret i32 %a
; After:
ret i32 %val
```

**Example 4: Full Pattern Optimization**
```llvm
; Before:
%a = call i32 @llvm.amdgcn.mbcnt.lo(i32 -1, i32 0)
%b = call i32 @llvm.amdgcn.mbcnt.hi(i32 -1, i32 %a); With workgroup_size = 64
; After:
%b = call i32 @llvm.amdgcn.workitem.id.x()

; Before:
%a = call i32 @llvm.amdgcn.mbcnt.lo(i32 -1, i32 0)
%b = call i32 @llvm.amdgcn.mbcnt.hi(i32 -1, i32 %a); With workgroup_size = 48 (Partial mask)
; After:
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%b = and i32 %tid, 31
```
2026-01-12 20:18:07 +01:00
Matt Arsenault
7d7d063c9c
AMDGPU: Stop requiring afn for f32 rsq formation (#172082)
We were checking for afn or !fpmath attached to the sqrt. We
are not trying to replace a correctly rounded rsqrt; we're replacing
the two correctly rounded operations with the contracted operation.
It's net a better precision, so contract on both instructions should
be sufficient. Both the contracted and uncontracted sequences pass
the OpenCL conformance test, with a lower maximum error contracted.
2025-12-23 18:16:15 +01:00
Matt Arsenault
2c841b74bd
AMDGPU: Introduce f64 rsq pattern in AMDGPUCodeGenPrepare (#172053) 2025-12-22 15:49:59 +01:00
Jim Lin
48a9b07264
[AMDGPU] Remove unused functions isSigned. NFC (#169750)
These have been unused since
https://github.com/llvm/llvm-project/pull/145483.
2025-11-27 01:32:26 +00:00
Pierre van Houtryve
e26058cefc
[AMDGPU] Use reverse iteration in CodeGenPrepare (#145484)
In order to make this easier, I also removed all "removeFromParent"
calls from the visitors, instead adding instructions
to a set of instructions to delete once the function has been visited.
This avoids crashes due to functions deleting their operands. In theory
we could allow functions to delete the
instruction they visited (and only that one) but I think having one
idiom for everything is less error-prone.

Fixes #140219
2025-10-10 13:00:40 +02:00
Ivan Kosarev
faca8c9ed4
[AMDGPU][NFC] Only include CodeGenPassBuilder.h where needed. (#154769)
Saves around 125-210 MB of compilation memory usage per source for
roughly one third of our backend sources, ~60 MB on average.
2025-08-22 10:05:06 +01:00
paperchalice
8bacfb2538
[AMDGPU] Remove UnsafeFPMath uses (#151079)
Remove `UnsafeFPMath` in AMDGPU part, it blocks some bugfixes related to
clang and the ultimate goal is to remove `resetTargetOptions` method in
`TargetMachine`, see FIXME in `resetTargetOptions`.
See also
https://discourse.llvm.org/t/rfc-honor-pragmas-with-ffp-contract-fast

https://discourse.llvm.org/t/allowfpopfusion-vs-sdnodeflags-hasallowcontract

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-07-31 17:36:57 +08:00
Pierre van Houtryve
29e14c3b44
[AMDGPU] Remove widen-16-bit-ops from CGP (#145483)
This was already off by default so there is no codegen change.
2025-07-16 10:14:19 +02:00
Ramkumar Ramachandra
b40e4ceaa6
[ValueTracking] Make Depth last default arg (NFC) (#142384)
Having a finite Depth (or recursion limit) for computeKnownBits is very
limiting, but is currently a load-bearing necessity, as all KnownBits
are recomputed on each call and there is no caching. As a prerequisite
for an effort to remove the recursion limit altogether, either using a
clever caching technique, or writing a easily-invalidable KnownBits
analysis, make the Depth argument in APIs in ValueTracking uniformly the
last argument with a default value. This would aid in removing the
argument when the time comes, as many callers that currently pass 0
explicitly are now updated to omit the argument altogether.
2025-06-03 17:12:24 +01:00
Simon Pilgrim
111effe05e AMDGPUCodeGenPrepare.cpp - fix MSVC operator precedence warning. NFC. 2025-06-02 09:47:52 +01:00
Matt Arsenault
cc8d253f39
AMDGPU: Handle other fmin flavors in fract combine (#141987)
Since the input is either known not-nan, or we have explicit use
code checking if the input is a nan, any of the 3 is valid to match.
2025-05-29 22:11:01 +02:00
anjenner
a36cb01ea7
[AMDGPU] Handle CreateBinOp not returning BinaryOperator (#137791)
AMDGPUCodeGenPrepareImpl::visitBinaryOperator() calls
Builder.CreateBinOp() and casts the resulting Value as a BinaryOperator
without checking, leading to an assert failure in a case found by
fuzzing. In this case, the operands are constant and CreateBinOp does
constant folding so returns a Constant instead of a BinaryOperator.
2025-05-29 19:10:35 +02:00
Kazu Hirata
1e8e662174
[AMDGPU] Remove unused includes (NFC) (#141376)
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
2025-05-24 14:48:46 -07:00
Pierre van Houtryve
aacebaeab5
[AMDGPU] Do not promote uniform i16 operations to i32 in CGP (#140208)
For the majority of cases, this is a neutral or positive change.
There are even testcases that greatly benefit from it, but some regressions are possible.
There is #140040 for GlobalISel that'd need to be fixed but it's only a one instruction regression and I think it can be fixed later.

Solves #64591
2025-05-16 10:31:03 +02:00
Craig Topper
123758b1f4
[IRBuilder] Add versions of createInsertVector/createExtractVector that take a uint64_t index. (#138324)
Most callers want a constant index. Instead of making every caller
create a ConstantInt, we can do it in IRBuilder. This is similar to
createInsertElement/createExtractElement.
2025-05-02 16:10:18 -07:00
Jay Foad
9060ca0191
[AMDGPU] Check for nonnull loads feeding addrspacecast (#138184)
Handle nonnull loads just like nonnull arguments when checking for
addrspacecasts that are known never null.
2025-05-02 12:54:22 +01:00
Jay Foad
886f1199f0
[AMDGPU] Use variadic isa<>. NFC. (#137016) 2025-04-24 08:19:09 +01:00
Shoreshen
121cd7c6f0
Re apply 130577 narrow math for and operand (#133896)
Re-apply https://github.com/llvm/llvm-project/pull/130577

Which is reverted in https://github.com/llvm/llvm-project/pull/133880

The old application failed in address sanitizer due to
`tryNarrowMathIfNoOverflow` was called after `I.eraseFromParent();` in
`AMDGPUCodeGenPrepareImpl::visitBinaryOperator`, it create a use after
free failure.

To fix this, `tryNarrowMathIfNoOverflow` will be called before and
directly return if `tryNarrowMathIfNoOverflow` result in true.
2025-04-17 17:03:32 +08:00
Rahul Joshi
a3754ade63
[NFC][LLVM][AMDGPU] Cleanup pass initialization for AMDGPU (#134410)
- Remove calls to pass initialization from pass constructors.
- https://github.com/llvm/llvm-project/issues/111767
2025-04-07 17:27:50 -07:00
Shoreshen
7f14b2a9eb
Revert "[AMDGPU][CodeGenPrepare] Narrow 64 bit math to 32 bit if profitable" (#133880)
Reverts llvm/llvm-project#130577
2025-04-01 17:37:02 +08:00
Shoreshen
145b4a3950
[AMDGPU][CodeGenPrepare] Narrow 64 bit math to 32 bit if profitable (#130577)
For Add, Sub, Mul with Int64 type, if profitable, then do:
1. Trunc operands to Int32 type
2. Apply 32 bit Add/Sub/Mul
3. Zext to Int64 type
2025-04-01 11:18:17 +08:00
Rahul Joshi
74b7abf154
[IRBuilder] Add new overload for CreateIntrinsic (#131942)
Add a new `CreateIntrinsic` overload with no `Types`, useful for
creating calls to non-overloaded intrinsics that don't need additional
mangling.
2025-03-31 08:10:34 -07:00
Tim Gymnich
049f179606
[Analysis][NFC] Extract KnownFPClass (#133457)
- extract KnownFPClass for future use inside of GISelKnownBits

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-03-28 18:10:02 +01:00
Jay Foad
44607666b3
[AMDGPU] Simplify conditional expressions. NFC. (#129228)
Simplfy `cond ? val : false` to `cond && val` and similar.
2025-03-03 10:40:49 +00:00
choikwa
de12836f28
[AMDGPU] Rework getDivNumBits API (#119768)
Rework involves below:
- Return unsigned value, the number of div/rem bits actually needed.
- Change from AtLeast(SignBits) to MaxDivBits hint.
- Use MaxDivBits hint for unsigned case.
- Remove unnecessary second early exit.

Mostly NFC changes.
2025-01-09 10:08:13 -05:00
choikwa
8d2e611802
[AMDGPU] Calculate getDivNumBits' AtLeast using bitwidth (#121758)
Previously in shrinkDivRem64, it used fixed value 32 for AtLeast which
meant that <64bit divisions would be rejected from shrinking since logic
depended only on number of sign bits. I.e. 'idiv i48 %0, %1' would
return 24 for number of sign bits if %0,%1 both had 24 division bits,
and was rejected.
2025-01-07 01:31:09 -05:00
Ramkumar Ramachandra
4a0d53a0b0
PatternMatch: migrate to CmpPredicate (#118534)
With the introduction of CmpPredicate in 51a895a (IR: introduce struct
with CmpInst::Predicate and samesign), PatternMatch is one of the first
key pieces of infrastructure that must be updated to match a CmpInst
respecting samesign information. Implement this change to Cmp-matchers.

This is a preparatory step in migrating the codebase over to
CmpPredicate. Since we no functional changes are desired at this stage,
we have chosen not to migrate CmpPredicate::operator==(CmpPredicate)
calls to use CmpPredicate::getMatching(), as that would have visible
impact on tests that are not yet written: instead, we call
CmpPredicate::operator==(Predicate), preserving the old behavior, while
also inserting a few FIXME comments for follow-ups.
2024-12-13 14:18:33 +00:00
choikwa
463e93b95f
Reapply [AMDGPU] prevent shrinking udiv/urem if either operand exceeds signed max (#119325)
This reverts commit 254d206ee2a337cb38ba347c896f7c6a14c7f218.

+Added a fix in ExpandDivRem24 to disqualify if DivNumBits exceed 24.

Original commit & msg:
ce6e955ac374f2b86cbbb73b2f32174dffd85f25.
Handle signed and unsigned path differently in getDivNumBits. Using
computeKnownBits, this rejects shrinking unsigned div/rem if operands
exceed signed max since we know NumSignBits will be always 0.
2024-12-12 15:24:34 -05:00
Joseph Huber
254d206ee2 Revert "Reapply "[AMDGPU] prevent shrinking udiv/urem if either operand is in… (#118928)"
This reverts commit 509893b58ff444a6f080946bd368e9bde7668f13.

This broke the libc build again https://lab.llvm.org/buildbot/#/builders/73/builds/9787.
2024-12-09 08:10:49 -06:00
choikwa
509893b58f
Reapply "[AMDGPU] prevent shrinking udiv/urem if either operand is in… (#118928)
… (SignedMax,UnsignedMax] (#116733)"

This reverts commit 905e831f8c8341e53e7e3adc57fd20b8e08eb999.

Handle signed and unsigned path differently in getDivNumBits. Using
computeKnownBits, this rejects shrinking unsigned div/rem if operands
exceed signed max since we know NumSignBits will be always 0.

Rebased and re-attempt after first one was reverted due to unrelated
failure in LibC (should be fixed by now I'm told).
2024-12-06 19:14:39 -05:00
Jay Foad
9ad09b2930
[AMDGPU] Refine AMDGPUCodeGenPrepareImpl class. NFC. (#118461)
Use references instead of pointers for most state, initialize it all in
the constructor, and common up some of the initialization between the
legacy and new pass manager paths.
2024-12-03 15:31:25 +00:00
Jay Foad
3923e0451a
[AMDGPU] Preserve all analyses if nothing changed (#117994) 2024-11-28 14:33:05 +00:00
Joseph Huber
905e831f8c Revert "[AMDGPU] prevent shrinking udiv/urem if either operand is in (SignedMax,UnsignedMax] (#116733)"
This reverts commit b8e1d4dbea8905e48d51a70bf75cb8fababa4a60.

Causes failures on the `libc` test suite https://lab.llvm.org/buildbot/#/builders/73/builds/8871
2024-11-20 18:21:10 -06:00
choikwa
b8e1d4dbea
[AMDGPU] prevent shrinking udiv/urem if either operand is in (SignedMax,UnsignedMax] (#116733)
Do this by using ComputeKnownBits and checking for !isNonNegative and
isUnsigned. This rejects shrinking unsigned div/rem if operands exceed
smax_bitwidth since we know NumSignBits will be always 0.
2024-11-20 11:22:09 -05:00
Jay Foad
85c17e4092
[LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112706)
Convert many instances of:
  Fn = Intrinsic::getOrInsertDeclaration(...);
  CreateCall(Fn, ...)
to the equivalent CreateIntrinsic call.
2024-10-17 16:20:43 +01:00
Rahul Joshi
fa789dffb1
[NFC] Rename Intrinsic::getDeclaration to getOrInsertDeclaration (#111752)
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
2024-10-11 05:26:03 -07:00
Matt Arsenault
79516ddbee
AMDGPU: Fix assert from wrong address space size assumption (#97267)
This was assuming the source address space was at least as large
as the destination of the cast. I'm not sure why this was casting
to begin with; the assumption seems to be the source
address space from the root addrspacecast matches the underlying
object so directly check that.

Fixes #97457
2024-07-02 23:18:25 +02:00
Stephen Tozer
d75f9dd1d2 Revert "[IR][NFC] Update IRBuilder to use InsertPosition (#96497)"
Reverts the above commit, as it updates a common header function and
did not update all callsites:

  https://lab.llvm.org/buildbot/#/builders/29/builds/382

This reverts commit 6481dc57612671ebe77fe9c34214fba94e1b3b27.
2024-06-24 18:00:22 +01:00
Stephen Tozer
6481dc5761
[IR][NFC] Update IRBuilder to use InsertPosition (#96497)
Uses the new InsertPosition class (added in #94226) to simplify some of
the IRBuilder interface, and removes the need to pass a BasicBlock
alongside a BasicBlock::iterator, using the fact that we can now get the
parent basic block from the iterator even if it points to the sentinel.
This patch removes the BasicBlock argument from each constructor or call
to setInsertPoint.

This has no functional effect, but later on as we look to remove the
`Instruction *InsertBefore` argument from instruction-creation
(discussed
[here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)),
this will simplify the process by allowing us to deprecate the
InsertPosition constructor directly and catch all the cases where we use
instructions rather than iterators.
2024-06-24 17:27:43 +01:00
Shilei Tian
0a43ca731b
[AMDGPU] Fix missing IsExact flag when expanding vector binary operator (#86712) 2024-03-27 17:40:58 -04:00
Peter Rong
4a026b5092
[AMDGCN] Use ZExt when handling indices in insertment element (#85718)
When i1 true is used as an index, SExt extends it to i32 -1. This would
cause BitVector to overflow.
The language manual have specified that the index shall be treated as an
unsigned number, this patch fixes that.
(https://llvm.org/docs/LangRef.html#insertelement-instruction)

This patch fixes #85717

---------

Signed-off-by: Peter Rong <PeterRong96@gmail.com>
2024-03-19 21:44:08 -07:00
Orlando Cazalet-Hyams
3ab1481f9a
[RemoveDIs] Use getFirstNonPHIIt to fix crash #85472 (#85618) 2024-03-18 09:57:22 +00:00
Pierre van Houtryve
756166e342
[AMDGPU] Improve detection of non-null addrspacecast operands (#82311)
Use IR analysis to infer when an addrspacecast operand is nonnull, then
lower it to an intrinsic that the DAG can use to skip the null check.

I did this using an intrinsic as it's non-intrusive. An alternative
would have been to allow something like `!nonnull` on `addrspacecast`
then lower that to a custom opcode (or add an operand to the
addrspacecast MIR/DAG opcodes), but it's a lot of boilerplate for just
one target's use case IMO.

I'm hoping that when we switch to GISel that we can move all this logic
to the MIR level without losing info, but currently the DAG doesn't see
enough so we need to act in CGP.

Fixes: SWDEV-316445
2024-03-01 14:01:10 +01:00
choikwa
e5638c5a00
[AMDGPU] Use correct number of bits needed for div/rem shrinking (#80622)
There was an error where dividend of type i64 and actual used number of
bits of 32 fell into path that assumes only 24 bits being used. Check
that AtLeast field is used correctly when using computeNumSignBits and
add necessary extend/trunc for 32 bits path.

Regolden and update testcases.

@jrbyrnes @bcahoon @arsenm @rampitec
2024-02-06 21:32:28 +05:30