86534 Commits

Author SHA1 Message Date
Stanislav Mekhanoshin
ea14834966
[AMDGPU] Per-subtarget DPP instruction classification (#153096)
This is NFCI at this point.
2025-08-11 15:41:02 -07:00
Stanislav Mekhanoshin
b9ecee9d47
[AMDGPU] Fix DPP combining into V_BITOP3_B32 (#153083) 2025-08-11 15:39:02 -07:00
Princeton Ferro
00369230c1
[NVPTX] expand extractelt(v2f32) with dynamic index (#153078)
Addresses
https://github.com/llvm/llvm-project/pull/126337#issuecomment-3162756334
2025-08-11 14:42:29 -07:00
Kaitlin Peng
cbc8f650ea
[HLSL][DirectX] Fix dot2add DXIL operation to use float overload (#152781)
Fixes #152585.

The `dot2add` DXILOpFunction should be `dx.op.dot2AddHalf.f32` (i.e. it
has [a single overload that's a
float](https://github.com/microsoft/DirectXShaderCompiler/blob/main/utils/hct/hctdb.py#L3960),
rather than no overloads). It was also being defined for too low of a
DXIL version - [dxc says
SM6.4](https://github.com/microsoft/DirectXShaderCompiler/blob/main/utils/hct/hctdb.py#L740).
2025-08-11 13:03:24 -07:00
Nikita Popov
2bf2b6b24d
[AVR] Only specify one legal int string in data layout (#153010)
There should not be both `n8` and `n16:8`. This is a single list of
legal integers. Additionally, this should use the standard order in
increasing size `n8:16`.
2025-08-11 20:42:28 +02:00
Peter Collingbourne
a9227316bf MC: Introduce R_AARCH64_PATCHINST relocation type.
The R_AARCH64_PATCHINST relocation type is to support deactivation
symbols. For more information, see the RFC:
https://discourse.llvm.org/t/rfc-deactivation-symbols/85556

Part of the AArch64 psABI extension:
https://github.com/ARM-software/abi-aa/issues/340
2025-08-11 11:31:36 -07:00
Augie Fackler
e0f5e2850f
[Xtensa] Fix function signature after e92b7e9641 (#153054)
Noticed this in the Rust/LLVM-HEAD CI, which for some obscure reason
bothers to build Xtensa.
2025-08-11 14:24:29 -04:00
Helena Kotas
fb1035cfb4
[DirectX] Fix resource binding analysis incorrectly removing duplicates (#152253)
The resource binding analysis was incorrectly reducing the size of the
`Bindings` vector by one element after sorting and de-duplication. This
led to an inaccurate setting of the `HasOverlappingBinding` flag in the
`DXILResourceBindingInfo` analysis, as the truncated vector no longer
reflected the true binding state.

This update corrects the shrink logic and introduces an `assert` in the
`DXILPostOptimizationValidation` pass. The assertion will trigger if
`HasOverlappingBinding` is set but no corresponding error is detected,
helping catch future inconsistencies.

The bug surfaced when the `srv_metadata.hlsl` and `uav_metadata.hlsl`
tests were updated to include unbounded resource arrays as part of
https://github.com/llvm/llvm-project/issues/145422. These updated test
files are included in this PR, as they would cause the new assertion to
fire if the original issue remained unresolved.

Depends on #152250
2025-08-11 10:53:00 -07:00
Luke Lau
81b576e66b
[RISCV] Cost casts with illegal types that can't be legalized (#153030)
If we have a floating point vector and no zve32f/zve64f/zve64d, we can
end up with an invalid type-legalization cost from
getTypeLegalizationCost.

Previously this triggered an assertion that the type must have been
legalized if the "legal" type is a vector, but in this case when it's
not possible to legalize the original type is spat back out.

This fixes it by just checking that the legalization cost is valid.

We don't have much testing for zve64x, so we may have other places in
the cost model with this issue.

Fixes #153008
2025-08-12 00:29:39 +08:00
Wesley Wiser
40a469f79a
Reapply "[X86] Correct 32-bit immediate assertion and fix 64-bit lowering for huge frame offsets" (#152239)
The first commit is identical to
69bec0afbb8f2aa0021d18ea38768360b16583a9.

The second commit fixes the instruction verification failures by
replacing the erroneous instruction with a trap after the error is
reported and adds `-verify-machineinstrs` to the tests added in the
original PR to catch the issue sooner.

After that change, all tests pass with both
`LLVM_ENABLE_EXPENSIVE_CHECKS={On,Off}`.

cc @RKSimon @e-kud @phoebewang @arsenm as reviewers on the original PR
2025-08-11 21:23:44 +05:30
Matt Arsenault
9a293530d9
AMDGPU: Handle multiple AGPR MFMA rewrites (#147975)
I have this firing on one of the real examples, need to
produce the tests and check a few edge cases
2025-08-11 23:10:35 +09:00
Craig Topper
f55281ac38
[RISCV] Add a high half PACKW+PACK pattern for RV64. (#152760)
Similar to the PACKH+PACK pattern for RV32. We can end up with the
shift left by 32 neeed by our PACK pattern hidden behind an OR that
packs 2 half words.
2025-08-11 07:37:55 -05:00
Luke Lau
acb86fb9e0
[TTI] Consistently pass the pointer type to getAddressComputationCost. NFCI (#152657)
In some places we were passing the type of value being accessed, in
other cases we were passing the type of the pointer for the access.

The most "involved" user is
LoopVectorizationCostModel::getMemInstScalarizationCost, which is the
only call site that passes in the SCEV, and it passes along the pointer
type.

This changes call sites to consistently pass the pointer type, and
renames the arguments to clarify this.

No target actually checks the contents of the type passed, only to see
if it's a vector or not, so this shouldn't have an effect.
2025-08-11 18:00:12 +08:00
Sander de Smalen
2ad1d77b17
[AArch64] Match constants in SelectSMETileSlice (#151494)
If the slice is a constant then it should try to use `WZR + <imm>`
addressing mode if the constant fits the range.
2025-08-11 10:19:26 +01:00
Nikita Popov
e92b7e9641
[CodeGen] Provide original IR type to CC lowering (NFC) (#152709)
It is common to have ABI requirements for illegal types: For example,
two i64 argument parts that originally came from an fp128 argument may
have a different call ABI than ones that came from a i128 argument.

The current calling convention lowering does not provide access to this
information, so backends come up with various hacks to support it (like
additional pre-analysis cached in CCState, or bypassing the default
logic entirely).

This PR adds the original IR type to InputArg/OutputArg and passes it
down to CCAssignFn. It is not actually used anywhere yet, this just does
the mechanical changes to thread through the new argument.
2025-08-11 08:57:53 +02:00
AZero13
e6b4daf48c
[AArch64] Support MI and PL (#150314)
Now, why would we want to do this?

There are a small number of places where this works:
1. It helps peepholeopt when less flag checking.
2. It allows the folding of things such as x - 0x80000000 < 0 to be
folded to cmp x, register holding this value
3. We can refine the other passes over time for this.
2025-08-11 07:41:38 +01:00
Matt Arsenault
1f86deb5a4 AMDGPU: Add debug printing for early exit if there are no AGPRs allocated 2025-08-11 09:27:05 +09:00
Trevor Gross
733fddb6f4
[AVR] Change half to use softPromoteHalfType (#152783)
The default `half` legalization has some issues with quieting NaNs and
carrying excess precision. As has been done for various other targets,
update AVR to use `softPromoteHalfType` which avoids these issues.

The most obvious corrected test below is `test_load_store`, which no
longer contains calls to extend and trunc (this passing through libcalls
means that `f16` does not round trip).

Fixes the AVR part of https://github.com/llvm/llvm-project/issues/97975
Fixes the AVR part of https://github.com/llvm/llvm-project/issues/97981
2025-08-10 11:39:27 +08:00
Alexander Richardson
87ad9122e5
[AMDGPULowerBufferFatPointers] Handle ptrtoaddr by extending the offset
Reviewed By: krzysz00

Pull Request: https://github.com/llvm/llvm-project/pull/139413
2025-08-09 16:28:12 -07:00
Craig Topper
7fb8630e71
[RISCV] Add another packh+packw pattern. (#152744)
If the upper 32 bits are demanded, we might have a sext_inreg in
the pattern on the byte shifted by 24. We can also match this case
since packw sign extends from bit 31.
2025-08-09 09:23:44 -05:00
Kazu Hirata
5ebb22de6a
[Mips] Remove an unnecessary cast (NFC) (#152837)
getZExtValue() already returns uint64_t.
2025-08-09 06:58:06 -07:00
Krzysztof Parzyszek
f89306fe74 [AVR] Fix build break with shared libraries
For example:

/usr/bin/ld: lib/Target/AVR/CMakeFiles/LLVMAVRCodeGen.dir/AVRTargetMachi
ne.cpp.o: in function `llvm::TargetTransformInfoImplCRTPBase<llvm::AVRTT
IImpl>::~TargetTransformInfoImplCRTPBase()':
AVRTargetMachine.cpp:(.text._ZN4llvm31TargetTransformInfoImplCRTPBaseINS
_10AVRTTIImplEED2Ev[_ZN4llvm31TargetTransformInfoImplCRTPBaseINS_10AVRTT
IImplEED5Ev]+0x13): undefined reference to `llvm::TargetTransformInfoImp
lBase::~TargetTransformInfoImplBase()'

Add missing dependencies to CMakeLists.txt.
2025-08-09 08:36:59 -05:00
Stanislav Mekhanoshin
10e146a716
[AMDGPU] Fix out of bound physreg tuple condition. NFC. (#152777)
The end register of the tuple shall be below the last existing
register. The check does not work on something like {v[255:256]}.
Overall it works correctly because if fails later at the
getMatchingSuperReg() call.
2025-08-09 01:50:13 -07:00
Tom Vijlbrief
160f5ca0f5
[AVR][NFC] Split AVRTargetTransformInfo.h to AVRTargetTransformInfo.cpp (#152841) 2025-08-09 16:09:45 +08:00
Tom Vijlbrief
97f0ff0c80
[AVR] Fix Avr indvar detection and strength reduction (missed optimization) (#152028)
Fix https://github.com/llvm/llvm-project/issues/151080
2025-08-09 12:46:32 +08:00
Matt Arsenault
0a0f077b94
AMDGPU: Add missing static to cl::opt (#152747) 2025-08-09 08:33:09 +09:00
Deric C.
e13cb3e299
[DirectX] Update lifetime legalization to account for the removed size argument (#152791)
Fixes #152754 

- Fixes the ArgOperand index in `DXILOpLowering.cpp` used to obtain the
pointer operand of a lifetime intrinsic.
- Updates the tests
`llvm/test/CodeGen/DirectX/legalize-lifetimes-valver-1.5.ll`,
`llvm/test/CodeGen/DirectX/legalize-lifetimes-valver-1.6.ll`,
`llvm/test/CodeGen/DirectX/ShaderFlags/lifetimes-noint64op.ll`, and
`llvm/test/tools/dxil-dis/lifetimes.ll` to use the new size-less
lifetime intrinsic
- Removes lifetime intrinsics from the test
`llvm/test/CodeGen/DirectX/legalize-memset.ll` to be consistent with the
corresponding memcpy test which does not have lifetime intrinsics.
(Removal of lifetime intrinsics from tests like this was suggested here
in the past:
https://github.com/llvm/llvm-project/pull/139173#discussion_r2091778868)
- Rewrites the lifetime legalization functions in the EmbedDXILPass to
re-add the explicit size argument for DXIL
2025-08-08 14:32:27 -07:00
Min-Yih Hsu
c065ed3912
[RISCV] Add intrinsics for strided segment stores with fixed vectors (#152038)
These are the strided versions of `riscv.segN.store.mask` intrinsics.
2025-08-08 14:08:08 -07:00
Mikhail R. Gadelha
e91f68487c
[RISCV] Update SpacemiT-X60 vector fixed-point arithmetic latencies (#150517)
This PR adds hardware-measured latencies for all instructions defined in
Section 12 of the RVV specification: "Vector Fixed-Point Arithmetic
Instructions" to the SpacemiT-X60 scheduling model.
2025-08-08 11:57:35 -03:00
David Green
26b302fd8b [AArch64] Rename Cost -> PromotedCost to avoid shadowing error 2025-08-08 14:37:24 +01:00
David Green
7f1638efc1
[AArch64] Generalize costing for FP16 instructions (#150033)
This extracts the code for modelling a fp16 operation as
`fptrunc(fpop(fpext,fpext))` into a new function named
getFP16BF16PromoteCost so that it can be reused by the
arithmetic instructions. The function takes a lambda to
calculate the cost of the operation with the promoted type.
2025-08-08 13:40:07 +01:00
Lucas Ramirez
83c308f014
[AMDGPU][Scheduler] Consistent occupancy calculation during rematerialization (#149224)
The `RPTarget`'s way of determining whether VGPRs are beneficial to save
and whether the target has been reached w.r.t. VGPR usage currently
assumes, if `CombinedVGPRSavings` is true, that free slots in one VGPR
RC can always be used for the other. Implicitly, this makes the
rematerialization stage (only current user of `RPTarget`) follow a
different occupancy calculation than the "regular one" that the
scheduler uses, one that assumes that ArchVGPR/AGPR usage can be
balanced perfectly and at no cost, which is untrue in general. This
ultimately yields suboptimal rematerialization decisions that require
cross-VGPR-RC copies unnecessarily.

This fixes that, making the `RPTarget`'s internal model of occupancy
consistent with the regular one. The `CombinedVGPRSavings` flag is
removed, and a form of cross-VGPR-RC saving implemented only for unified
RFs, which is where it makes the most sense. Only when the amount of
free VGPRs in a given VGPR RC (ArchVPGR or AGPR) is lower than the
excess VGPR usage in the other VGPR RC does the `RPTarget` consider that
a pressure reduction in the former will be beneficial to the latter.
2025-08-08 14:26:04 +02:00
Diana Picus
a910a6a8b5
[AMDGPU] AsmPrinter: Unify arg handling (#151672)
When computing the number of registers required by entry functions, the
`AMDGPUAsmPrinter` needs to take into account both the register usage
computed by the `AMDGPUResourceUsageAnalysis` pass, and the number
of registers initialized by the hardware. At the moment, the way it
computes the latter is different for graphics vs compute, due to differences in
the implementation. For kernels, all the information needed is available in
the `SIMachineFunctionInfo`, but for graphics shaders we would iterate over
the `Function`  arguments in the `AMDGPUAsmPrinter`. This pretty much 
repeats some of the logic from instruction selection.

This patch introduces 2 new members to `SIMachineFunctionInfo`, one
for SGPRs and one for VGPRs. Both will be computed during instruction
selection and then used during `AMDGPUAsmPrinter`, removing the need
to refer to the `Function` when printing assembly.

This patch is NFC except for the fact that we now add the extra SGPRs
(VCC, XNACK etc) to the number of SGPRs computed for graphics entry points.
I'm not sure why these weren't included before. It would be nice if
someone could confirm if that was just an oversight or if we have some docs
somewhere that I haven't managed to find. Only one test is affected (its SGPR
usage increases because we now take into account the XNACK registers).
2025-08-08 12:00:37 +02:00
Graham Hunter
de72cca671
[CostModel] Provide a default model for histogram intrinsics (#149348)
Since we scalarize these intrinsics when the target does not support
them, we should model that for costing purposes.
2025-08-08 11:00:00 +01:00
Matt Arsenault
81f3ddf4a2
AMDGPU: Rewrite VGPR MFMAs to AGPR when directly copied to AGPR class (#152480) 2025-08-08 18:20:21 +09:00
Nikita Popov
c23b4fbdbb
[IR] Remove size argument from lifetime intrinsics (#150248)
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.

This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
2025-08-08 11:09:34 +02:00
Cullen Rhodes
e9d71efb83
[AArch64] Mark [usp]mull, [us]addl, [us]abdl as commutative (#152158)
Fixes #61461.
2025-08-08 09:35:28 +01:00
Nikita Popov
18e4f775c3
[SystemZ] Remove incorrect areInlineCompatible hook (#152494)
This reverts https://github.com/llvm/llvm-project/pull/132976.

The PR incorrectly claimed that this makes inlining more liberal,
referencing the string comparison in TargetTransformInfoImpl.h.

However, the implementation that actually applies is the one in
BasicTTIImpl.h, which performs a feature subset comparison. As such,
this regressed inlining, most concerningly of functions without +vector
into functions with +vector.

Revert the change to restore the previous behavior.
2025-08-08 10:06:19 +02:00
Paul Murphy
5f864560a6
[PowerPC] fix lowering of SPILL_CRBIT on pwr9 and pwr10 (#146424)
If a copy exists between creation of a crbit and a spill, machine-cp
may delete the copy since it seems unaware of the relation between a cr
and crbit. A fix was previously made for the generic ppc64 lowering. It
should be applied to the pwr9 and pwr10 variants too.

Likewise, relax and extend the pwr8 test to verify pwr9 and pwr10
codegen too.

This fixes #143989.
2025-08-08 09:24:22 +02:00
David Green
229ab5aa2b
[AArch64] Drop flags from BSP pseudos (#151856)
This prevents cases where some of the operands match from hitting
verifier errors with kill flags. These nodes should have been removed
earlier in most cases.

Fixes the direct issue from #149380. #151855 cleans up the codegen.
2025-08-08 07:47:56 +01:00
Luke Lau
7074471593
[RISCV] Enable tail folding by default (#151681)
We have been tracking the performance of EVL tail folding in the loop
vectorizer on RISC-V for a while now, and after much hard work from
various contributors we think it should be generally profitable to
enable by default now.

With tail folding there is a 21% improvement on 525.x264_r on SPEC CPU
2017 on the BPI-F3 (-march=rva22u64_v -O3 -flto), as well as a 30%
geomean codesize reduction on SPEC and TSVC, with no significant
regressions detected.

Now that we are early into the LLVM 22.x development cycle it seems like
a good time to enable it to catch any issues. There are still more EVL
related items of work being tracked in #123069, which should continue to
improve performance.
2025-08-08 14:26:23 +08:00
AZero13
6a425f1e54
[ARM] Have custom lowering for ucmp and scmp (#149315)
Limited to non-thumb1 for scmp at the moment, since there is no good way
to do it.
2025-08-08 06:51:18 +01:00
Jim Lin
b9ca01b746 [RISCV] Move the decoder table for XCV, Xqci and XRivos from standard section to vendor section. NFC 2025-08-08 11:18:18 +08:00
Fangrui Song
3769ce013b
MC: Refine ALIGN relocation conditions
Each section now tracks the index of the first linker-relaxable
fragment, enabling two changes:

* Delete redundant ALIGN relocations before the first linker-relaxable
  instruction in a section. The primary example is the offset 0
  R_RISCV_ALIGN relocation for a text section aligned by 4.
* For alignments larger than the NOP size after the first
  linker-relaxable instruction, ALIGN relocations are now generated, even in
  norelax regions. This fixes the issue #150159.

The new test llvm/test/MC/RISCV/Relocations/align-after-relax.s
verifies the required ALIGN in a norelax region following
linker-relaxable instructions.
By using a fragment index within the subsection (which is less than or
equal to the section's index), the implementation may generate redundant
ALIGN relocations in lower-numbered subsections before the first
linker-relaxable instruction.

align-option-relax.s demonstrates the ALIGN optimization.
Add an initial `call` to a few tests to prevent the ALIGN optimization.

---

When the alignment exceeds 2, we insert $alignment-2 bytes of NOPs, even
in non-RVC code. This enables non-RVC code following RVC code to handle
a 2-byte adjustment without requiring an additional state in MCSection
or AsmParser.

```
.globl _start
_start:
// GNU ld can relax this to  6505          lui     a0, 0x1
// LLD hasn't implemented this transformation.
  lui a0, %hi(foo)

.option push
.option norelax
.option norvc
// Now we generate R_RISCV_ALIGN with addend 2, even if this is a norvc region.
.balign 4
b0:
  .word 0x3a393837
.option pop
foo:
```

Pull Request: https://github.com/llvm/llvm-project/pull/150816
2025-08-07 19:16:58 -07:00
Stanislav Mekhanoshin
469863111f
[AMDGPU] Enable CodeGen for v_pk_fma_bf16 (#152578) 2025-08-07 16:19:59 -07:00
Stanislav Mekhanoshin
dddeb07c2e
[AMDGPU] Restrict packed math FP32 instructions to read only one SGPR per operand on gfx12+ (#152465)
Sec. 4.6.7.1 of the gfx1250 SPG states that if an SGPR is used
as an operand, only one SGPR will be read for both the low and high
operations. As a result, the corresponding bits in `op_sel` and
`op_sel_hi` must be the same when the operand is an SGPR.

Co-authored-by: Tian, Shilei <Shilei.Tian@amd.com>

Co-authored-by: Tian, Shilei <Shilei.Tian@amd.com>
2025-08-07 16:13:34 -07:00
Stanislav Mekhanoshin
82046c7f33
[AMDGPU] Adjust hard clause rules for gfx1250 (#152592)
Change from GFX12: Relax S_CLAUSE rules to all all non-flat memory types
in
the same clause, and all Flat types in the same.

For VMEM/FLAT clause types now look like:

- Non-Flat (load, store, atomic): buffer, global, scratch, TDM, Async
- Flat: load, store, atomic
2025-08-07 14:59:31 -07:00
Stanislav Mekhanoshin
abc22f771e
[AMDGPU] Fix buffer addressing mode matching (#152584)
Starting in gfx1250, voffset and immoffset are zero-extended from 32
bits
to 45 bits before being added together.
2025-08-07 14:23:41 -07:00
Hood Chatham
b9c328480c
[clang][WebAssembly] Support reftypes & varargs in test_function_pointer_signature (#150921)
I fixed support for varargs functions
(previously it didn't crash but the codegen was incorrect).

I added tests for structs and unions which already work. With the
multivalue abi they crash in the backend, so I added a sema check that
rejects structs and unions for that abi.

It will also crash in the backend if passed an int128 or float128 type.
2025-08-07 13:07:04 -07:00
Stanislav Mekhanoshin
d09dbdabb9
[AMDGPU] bf16 clamp folding (#152573) 2025-08-07 12:59:50 -07:00