llvm-project

Author	SHA1	Message	Date
Stanislav Mekhanoshin	ea14834966	[AMDGPU] Per-subtarget DPP instruction classification (#153096 ) This is NFCI at this point.	2025-08-11 15:41:02 -07:00
Stanislav Mekhanoshin	b9ecee9d47	[AMDGPU] Fix DPP combining into V_BITOP3_B32 (#153083 )	2025-08-11 15:39:02 -07:00
Princeton Ferro	00369230c1	[NVPTX] expand extractelt(v2f32) with dynamic index (#153078 ) Addresses https://github.com/llvm/llvm-project/pull/126337#issuecomment-3162756334	2025-08-11 14:42:29 -07:00
Kaitlin Peng	cbc8f650ea	[HLSL][DirectX] Fix `dot2add` DXIL operation to use float overload (#152781 ) Fixes #152585. The `dot2add` DXILOpFunction should be `dx.op.dot2AddHalf.f32` (i.e. it has [a single overload that's a float](https://github.com/microsoft/DirectXShaderCompiler/blob/main/utils/hct/hctdb.py#L3960), rather than no overloads). It was also being defined for too low of a DXIL version - [dxc says SM6.4](https://github.com/microsoft/DirectXShaderCompiler/blob/main/utils/hct/hctdb.py#L740).	2025-08-11 13:03:24 -07:00
Nikita Popov	2bf2b6b24d	[AVR] Only specify one legal int string in data layout (#153010 ) There should not be both `n8` and `n16:8`. This is a single list of legal integers. Additionally, this should use the standard order in increasing size `n8:16`.	2025-08-11 20:42:28 +02:00
Peter Collingbourne	a9227316bf	MC: Introduce R_AARCH64_PATCHINST relocation type. The R_AARCH64_PATCHINST relocation type is to support deactivation symbols. For more information, see the RFC: https://discourse.llvm.org/t/rfc-deactivation-symbols/85556 Part of the AArch64 psABI extension: https://github.com/ARM-software/abi-aa/issues/340	2025-08-11 11:31:36 -07:00
Augie Fackler	e0f5e2850f	[Xtensa] Fix function signature after e92b7e9641 (#153054 ) Noticed this in the Rust/LLVM-HEAD CI, which for some obscure reason bothers to build Xtensa.	2025-08-11 14:24:29 -04:00
Helena Kotas	fb1035cfb4	[DirectX] Fix resource binding analysis incorrectly removing duplicates (#152253 ) The resource binding analysis was incorrectly reducing the size of the `Bindings` vector by one element after sorting and de-duplication. This led to an inaccurate setting of the `HasOverlappingBinding` flag in the `DXILResourceBindingInfo` analysis, as the truncated vector no longer reflected the true binding state. This update corrects the shrink logic and introduces an `assert` in the `DXILPostOptimizationValidation` pass. The assertion will trigger if `HasOverlappingBinding` is set but no corresponding error is detected, helping catch future inconsistencies. The bug surfaced when the `srv_metadata.hlsl` and `uav_metadata.hlsl` tests were updated to include unbounded resource arrays as part of https://github.com/llvm/llvm-project/issues/145422. These updated test files are included in this PR, as they would cause the new assertion to fire if the original issue remained unresolved. Depends on #152250	2025-08-11 10:53:00 -07:00
Luke Lau	81b576e66b	[RISCV] Cost casts with illegal types that can't be legalized (#153030 ) If we have a floating point vector and no zve32f/zve64f/zve64d, we can end up with an invalid type-legalization cost from getTypeLegalizationCost. Previously this triggered an assertion that the type must have been legalized if the "legal" type is a vector, but in this case when it's not possible to legalize the original type is spat back out. This fixes it by just checking that the legalization cost is valid. We don't have much testing for zve64x, so we may have other places in the cost model with this issue. Fixes #153008	2025-08-12 00:29:39 +08:00
Wesley Wiser	40a469f79a	Reapply "[X86] Correct 32-bit immediate assertion and fix 64-bit lowering for huge frame offsets" (#152239 ) The first commit is identical to 69bec0afbb8f2aa0021d18ea38768360b16583a9. The second commit fixes the instruction verification failures by replacing the erroneous instruction with a trap after the error is reported and adds `-verify-machineinstrs` to the tests added in the original PR to catch the issue sooner. After that change, all tests pass with both `LLVM_ENABLE_EXPENSIVE_CHECKS={On,Off}`. cc @RKSimon @e-kud @phoebewang @arsenm as reviewers on the original PR	2025-08-11 21:23:44 +05:30
Matt Arsenault	9a293530d9	AMDGPU: Handle multiple AGPR MFMA rewrites (#147975 ) I have this firing on one of the real examples, need to produce the tests and check a few edge cases	2025-08-11 23:10:35 +09:00
Craig Topper	f55281ac38	[RISCV] Add a high half PACKW+PACK pattern for RV64. (#152760 ) Similar to the PACKH+PACK pattern for RV32. We can end up with the shift left by 32 neeed by our PACK pattern hidden behind an OR that packs 2 half words.	2025-08-11 07:37:55 -05:00
Luke Lau	acb86fb9e0	[TTI] Consistently pass the pointer type to getAddressComputationCost. NFCI (#152657 ) In some places we were passing the type of value being accessed, in other cases we were passing the type of the pointer for the access. The most "involved" user is LoopVectorizationCostModel::getMemInstScalarizationCost, which is the only call site that passes in the SCEV, and it passes along the pointer type. This changes call sites to consistently pass the pointer type, and renames the arguments to clarify this. No target actually checks the contents of the type passed, only to see if it's a vector or not, so this shouldn't have an effect.	2025-08-11 18:00:12 +08:00
Sander de Smalen	2ad1d77b17	[AArch64] Match constants in SelectSMETileSlice (#151494 ) If the slice is a constant then it should try to use `WZR + <imm>` addressing mode if the constant fits the range.	2025-08-11 10:19:26 +01:00
Nikita Popov	e92b7e9641	[CodeGen] Provide original IR type to CC lowering (NFC) (#152709 ) It is common to have ABI requirements for illegal types: For example, two i64 argument parts that originally came from an fp128 argument may have a different call ABI than ones that came from a i128 argument. The current calling convention lowering does not provide access to this information, so backends come up with various hacks to support it (like additional pre-analysis cached in CCState, or bypassing the default logic entirely). This PR adds the original IR type to InputArg/OutputArg and passes it down to CCAssignFn. It is not actually used anywhere yet, this just does the mechanical changes to thread through the new argument.	2025-08-11 08:57:53 +02:00
AZero13	e6b4daf48c	[AArch64] Support MI and PL (#150314 ) Now, why would we want to do this? There are a small number of places where this works: 1. It helps peepholeopt when less flag checking. 2. It allows the folding of things such as x - 0x80000000 < 0 to be folded to cmp x, register holding this value 3. We can refine the other passes over time for this.	2025-08-11 07:41:38 +01:00
Matt Arsenault	1f86deb5a4	AMDGPU: Add debug printing for early exit if there are no AGPRs allocated	2025-08-11 09:27:05 +09:00
Trevor Gross	733fddb6f4	[AVR] Change `half` to use `softPromoteHalfType` (#152783 ) The default `half` legalization has some issues with quieting NaNs and carrying excess precision. As has been done for various other targets, update AVR to use `softPromoteHalfType` which avoids these issues. The most obvious corrected test below is `test_load_store`, which no longer contains calls to extend and trunc (this passing through libcalls means that `f16` does not round trip). Fixes the AVR part of https://github.com/llvm/llvm-project/issues/97975 Fixes the AVR part of https://github.com/llvm/llvm-project/issues/97981	2025-08-10 11:39:27 +08:00
Alexander Richardson	87ad9122e5	[AMDGPULowerBufferFatPointers] Handle ptrtoaddr by extending the offset Reviewed By: krzysz00 Pull Request: https://github.com/llvm/llvm-project/pull/139413	2025-08-09 16:28:12 -07:00
Craig Topper	7fb8630e71	[RISCV] Add another packh+packw pattern. (#152744 ) If the upper 32 bits are demanded, we might have a sext_inreg in the pattern on the byte shifted by 24. We can also match this case since packw sign extends from bit 31.	2025-08-09 09:23:44 -05:00
Kazu Hirata	5ebb22de6a	[Mips] Remove an unnecessary cast (NFC) (#152837 ) getZExtValue() already returns uint64_t.	2025-08-09 06:58:06 -07:00
Krzysztof Parzyszek	f89306fe74	[AVR] Fix build break with shared libraries For example: /usr/bin/ld: lib/Target/AVR/CMakeFiles/LLVMAVRCodeGen.dir/AVRTargetMachi ne.cpp.o: in function `llvm::TargetTransformInfoImplCRTPBase<llvm::AVRTT IImpl>::~TargetTransformInfoImplCRTPBase()': AVRTargetMachine.cpp:(.text._ZN4llvm31TargetTransformInfoImplCRTPBaseINS _10AVRTTIImplEED2Ev[_ZN4llvm31TargetTransformInfoImplCRTPBaseINS_10AVRTT IImplEED5Ev]+0x13): undefined reference to `llvm::TargetTransformInfoImp lBase::~TargetTransformInfoImplBase()' Add missing dependencies to CMakeLists.txt.	2025-08-09 08:36:59 -05:00
Stanislav Mekhanoshin	10e146a716	[AMDGPU] Fix out of bound physreg tuple condition. NFC. (#152777 ) The end register of the tuple shall be below the last existing register. The check does not work on something like {v[255:256]}. Overall it works correctly because if fails later at the getMatchingSuperReg() call.	2025-08-09 01:50:13 -07:00
Tom Vijlbrief	160f5ca0f5	[AVR][NFC] Split AVRTargetTransformInfo.h to AVRTargetTransformInfo.cpp (#152841 )	2025-08-09 16:09:45 +08:00
Tom Vijlbrief	97f0ff0c80	[AVR] Fix Avr indvar detection and strength reduction (missed optimization) (#152028 ) Fix https://github.com/llvm/llvm-project/issues/151080	2025-08-09 12:46:32 +08:00
Matt Arsenault	0a0f077b94	AMDGPU: Add missing static to cl::opt (#152747 )	2025-08-09 08:33:09 +09:00
Deric C.	e13cb3e299	[DirectX] Update lifetime legalization to account for the removed size argument (#152791 ) Fixes #152754 - Fixes the ArgOperand index in `DXILOpLowering.cpp` used to obtain the pointer operand of a lifetime intrinsic. - Updates the tests `llvm/test/CodeGen/DirectX/legalize-lifetimes-valver-1.5.ll`, `llvm/test/CodeGen/DirectX/legalize-lifetimes-valver-1.6.ll`, `llvm/test/CodeGen/DirectX/ShaderFlags/lifetimes-noint64op.ll`, and `llvm/test/tools/dxil-dis/lifetimes.ll` to use the new size-less lifetime intrinsic - Removes lifetime intrinsics from the test `llvm/test/CodeGen/DirectX/legalize-memset.ll` to be consistent with the corresponding memcpy test which does not have lifetime intrinsics. (Removal of lifetime intrinsics from tests like this was suggested here in the past: https://github.com/llvm/llvm-project/pull/139173#discussion_r2091778868) - Rewrites the lifetime legalization functions in the EmbedDXILPass to re-add the explicit size argument for DXIL	2025-08-08 14:32:27 -07:00
Min-Yih Hsu	c065ed3912	[RISCV] Add intrinsics for strided segment stores with fixed vectors (#152038 ) These are the strided versions of `riscv.segN.store.mask` intrinsics.	2025-08-08 14:08:08 -07:00
Mikhail R. Gadelha	e91f68487c	[RISCV] Update SpacemiT-X60 vector fixed-point arithmetic latencies (#150517 ) This PR adds hardware-measured latencies for all instructions defined in Section 12 of the RVV specification: "Vector Fixed-Point Arithmetic Instructions" to the SpacemiT-X60 scheduling model.	2025-08-08 11:57:35 -03:00
David Green	26b302fd8b	[AArch64] Rename Cost -> PromotedCost to avoid shadowing error	2025-08-08 14:37:24 +01:00
David Green	7f1638efc1	[AArch64] Generalize costing for FP16 instructions (#150033 ) This extracts the code for modelling a fp16 operation as `fptrunc(fpop(fpext,fpext))` into a new function named getFP16BF16PromoteCost so that it can be reused by the arithmetic instructions. The function takes a lambda to calculate the cost of the operation with the promoted type.	2025-08-08 13:40:07 +01:00
Lucas Ramirez	83c308f014	[AMDGPU][Scheduler] Consistent occupancy calculation during rematerialization (#149224 ) The `RPTarget`'s way of determining whether VGPRs are beneficial to save and whether the target has been reached w.r.t. VGPR usage currently assumes, if `CombinedVGPRSavings` is true, that free slots in one VGPR RC can always be used for the other. Implicitly, this makes the rematerialization stage (only current user of `RPTarget`) follow a different occupancy calculation than the "regular one" that the scheduler uses, one that assumes that ArchVGPR/AGPR usage can be balanced perfectly and at no cost, which is untrue in general. This ultimately yields suboptimal rematerialization decisions that require cross-VGPR-RC copies unnecessarily. This fixes that, making the `RPTarget`'s internal model of occupancy consistent with the regular one. The `CombinedVGPRSavings` flag is removed, and a form of cross-VGPR-RC saving implemented only for unified RFs, which is where it makes the most sense. Only when the amount of free VGPRs in a given VGPR RC (ArchVPGR or AGPR) is lower than the excess VGPR usage in the other VGPR RC does the `RPTarget` consider that a pressure reduction in the former will be beneficial to the latter.	2025-08-08 14:26:04 +02:00
Diana Picus	a910a6a8b5	[AMDGPU] AsmPrinter: Unify arg handling (#151672 ) When computing the number of registers required by entry functions, the `AMDGPUAsmPrinter` needs to take into account both the register usage computed by the `AMDGPUResourceUsageAnalysis` pass, and the number of registers initialized by the hardware. At the moment, the way it computes the latter is different for graphics vs compute, due to differences in the implementation. For kernels, all the information needed is available in the `SIMachineFunctionInfo`, but for graphics shaders we would iterate over the `Function` arguments in the `AMDGPUAsmPrinter`. This pretty much repeats some of the logic from instruction selection. This patch introduces 2 new members to `SIMachineFunctionInfo`, one for SGPRs and one for VGPRs. Both will be computed during instruction selection and then used during `AMDGPUAsmPrinter`, removing the need to refer to the `Function` when printing assembly. This patch is NFC except for the fact that we now add the extra SGPRs (VCC, XNACK etc) to the number of SGPRs computed for graphics entry points. I'm not sure why these weren't included before. It would be nice if someone could confirm if that was just an oversight or if we have some docs somewhere that I haven't managed to find. Only one test is affected (its SGPR usage increases because we now take into account the XNACK registers).	2025-08-08 12:00:37 +02:00
Graham Hunter	de72cca671	[CostModel] Provide a default model for histogram intrinsics (#149348 ) Since we scalarize these intrinsics when the target does not support them, we should model that for costing purposes.	2025-08-08 11:00:00 +01:00
Matt Arsenault	81f3ddf4a2	AMDGPU: Rewrite VGPR MFMAs to AGPR when directly copied to AGPR class (#152480 )	2025-08-08 18:20:21 +09:00
Nikita Popov	c23b4fbdbb	[IR] Remove size argument from lifetime intrinsics (#150248 ) Now that #149310 has restricted lifetime intrinsics to only work on allocas, we can also drop the explicit size argument. Instead, the size is implied by the alloca. This removes the ability to only mark a prefix of an alloca alive/dead. We never used that capability, so we should remove the need to handle that possibility everywhere (though many key places, including stack coloring, did not actually respect this).	2025-08-08 11:09:34 +02:00
Cullen Rhodes	e9d71efb83	[AArch64] Mark [usp]mull, [us]addl, [us]abdl as commutative (#152158 ) Fixes #61461.	2025-08-08 09:35:28 +01:00
Nikita Popov	18e4f775c3	[SystemZ] Remove incorrect areInlineCompatible hook (#152494 ) This reverts https://github.com/llvm/llvm-project/pull/132976. The PR incorrectly claimed that this makes inlining more liberal, referencing the string comparison in TargetTransformInfoImpl.h. However, the implementation that actually applies is the one in BasicTTIImpl.h, which performs a feature subset comparison. As such, this regressed inlining, most concerningly of functions without +vector into functions with +vector. Revert the change to restore the previous behavior.	2025-08-08 10:06:19 +02:00
Paul Murphy	5f864560a6	[PowerPC] fix lowering of SPILL_CRBIT on pwr9 and pwr10 (#146424 ) If a copy exists between creation of a crbit and a spill, machine-cp may delete the copy since it seems unaware of the relation between a cr and crbit. A fix was previously made for the generic ppc64 lowering. It should be applied to the pwr9 and pwr10 variants too. Likewise, relax and extend the pwr8 test to verify pwr9 and pwr10 codegen too. This fixes #143989.	2025-08-08 09:24:22 +02:00
David Green	229ab5aa2b	[AArch64] Drop flags from BSP pseudos (#151856 ) This prevents cases where some of the operands match from hitting verifier errors with kill flags. These nodes should have been removed earlier in most cases. Fixes the direct issue from #149380. #151855 cleans up the codegen.	2025-08-08 07:47:56 +01:00
Luke Lau	7074471593	[RISCV] Enable tail folding by default (#151681 ) We have been tracking the performance of EVL tail folding in the loop vectorizer on RISC-V for a while now, and after much hard work from various contributors we think it should be generally profitable to enable by default now. With tail folding there is a 21% improvement on 525.x264_r on SPEC CPU 2017 on the BPI-F3 (-march=rva22u64_v -O3 -flto), as well as a 30% geomean codesize reduction on SPEC and TSVC, with no significant regressions detected. Now that we are early into the LLVM 22.x development cycle it seems like a good time to enable it to catch any issues. There are still more EVL related items of work being tracked in #123069, which should continue to improve performance.	2025-08-08 14:26:23 +08:00
AZero13	6a425f1e54	[ARM] Have custom lowering for ucmp and scmp (#149315 ) Limited to non-thumb1 for scmp at the moment, since there is no good way to do it.	2025-08-08 06:51:18 +01:00
Jim Lin	b9ca01b746	[RISCV] Move the decoder table for XCV, Xqci and XRivos from standard section to vendor section. NFC	2025-08-08 11:18:18 +08:00
Fangrui Song	3769ce013b	MC: Refine ALIGN relocation conditions Each section now tracks the index of the first linker-relaxable fragment, enabling two changes: * Delete redundant ALIGN relocations before the first linker-relaxable instruction in a section. The primary example is the offset 0 R_RISCV_ALIGN relocation for a text section aligned by 4. * For alignments larger than the NOP size after the first linker-relaxable instruction, ALIGN relocations are now generated, even in norelax regions. This fixes the issue #150159. The new test llvm/test/MC/RISCV/Relocations/align-after-relax.s verifies the required ALIGN in a norelax region following linker-relaxable instructions. By using a fragment index within the subsection (which is less than or equal to the section's index), the implementation may generate redundant ALIGN relocations in lower-numbered subsections before the first linker-relaxable instruction. align-option-relax.s demonstrates the ALIGN optimization. Add an initial `call` to a few tests to prevent the ALIGN optimization. --- When the alignment exceeds 2, we insert $alignment-2 bytes of NOPs, even in non-RVC code. This enables non-RVC code following RVC code to handle a 2-byte adjustment without requiring an additional state in MCSection or AsmParser. ``` .globl _start _start: // GNU ld can relax this to 6505 lui a0, 0x1 // LLD hasn't implemented this transformation. lui a0, %hi(foo) .option push .option norelax .option norvc // Now we generate R_RISCV_ALIGN with addend 2, even if this is a norvc region. .balign 4 b0: .word 0x3a393837 .option pop foo: ``` Pull Request: https://github.com/llvm/llvm-project/pull/150816	2025-08-07 19:16:58 -07:00
Stanislav Mekhanoshin	469863111f	[AMDGPU] Enable CodeGen for v_pk_fma_bf16 (#152578 )	2025-08-07 16:19:59 -07:00
Stanislav Mekhanoshin	dddeb07c2e	[AMDGPU] Restrict packed math FP32 instructions to read only one SGPR per operand on gfx12+ (#152465 ) Sec. 4.6.7.1 of the gfx1250 SPG states that if an SGPR is used as an operand, only one SGPR will be read for both the low and high operations. As a result, the corresponding bits in `op_sel` and `op_sel_hi` must be the same when the operand is an SGPR. Co-authored-by: Tian, Shilei <Shilei.Tian@amd.com> Co-authored-by: Tian, Shilei <Shilei.Tian@amd.com>	2025-08-07 16:13:34 -07:00
Stanislav Mekhanoshin	82046c7f33	[AMDGPU] Adjust hard clause rules for gfx1250 (#152592 ) Change from GFX12: Relax S_CLAUSE rules to all all non-flat memory types in the same clause, and all Flat types in the same. For VMEM/FLAT clause types now look like: - Non-Flat (load, store, atomic): buffer, global, scratch, TDM, Async - Flat: load, store, atomic	2025-08-07 14:59:31 -07:00
Stanislav Mekhanoshin	abc22f771e	[AMDGPU] Fix buffer addressing mode matching (#152584 ) Starting in gfx1250, voffset and immoffset are zero-extended from 32 bits to 45 bits before being added together.	2025-08-07 14:23:41 -07:00
Hood Chatham	b9c328480c	[clang][WebAssembly] Support reftypes & varargs in test_function_pointer_signature (#150921 ) I fixed support for varargs functions (previously it didn't crash but the codegen was incorrect). I added tests for structs and unions which already work. With the multivalue abi they crash in the backend, so I added a sema check that rejects structs and unions for that abi. It will also crash in the backend if passed an int128 or float128 type.	2025-08-07 13:07:04 -07:00
Stanislav Mekhanoshin	d09dbdabb9	[AMDGPU] bf16 clamp folding (#152573 )	2025-08-07 12:59:50 -07:00

1 2 3 4 5 ...

86534 Commits