llvm-project

Author	SHA1	Message	Date
Simon Pilgrim	2f0400c1a1	[Thumb2] mve-shuffle.ll - add missing check prefix coverage for some fullfp16 cases (#180567 ) Noticed while working on some upcoming generic shuffle handling	2026-02-10 13:37:24 +00:00
Simon Pilgrim	dca7b11a32	[X86] Add tests showing failure to reduce the vector width of vpmaddwd/vpmaddubsw/pmulhrsw nodes (#180728 ) Missing demanded elts handling	2026-02-10 12:45:21 +00:00
Steffen Larsen	9501114ca0	[Verifier] Make verifier fail when global variable size exceeds address space size (#179625 ) When a global variable has a size that exceeds the size of the address space it resides in, the verifier should fail as the variable can neither be materialized nor fully accessed. This patch adds a check to the verifier to enforce it. --------- Signed-off-by: Steffen Holst Larsen <HolstLarsen.Steffen@amd.com> Co-authored-by: Steffen Holst Larsen <HolstLarsen.Steffen@amd.com>	2026-02-10 13:27:38 +01:00
Mirko Brkušanin	4280f0d241	[AMDGPU] Add dot4 fp8/bf8 instructions for gfx1170 (#180516 )	2026-02-10 12:14:49 +01:00
Anshil Gandhi	bd6dd94584	[AMDGPU] Add legalization rules for atomicrmw max/min ops (#180502 ) Adds rules for G_ATOMICRMW_{MAX, MIN, UMAX, UMIN, UINC_WRAP, UDEC_WRAP}. Each of these generic opcode are supported for S32 and S64 types on flat, global and local address spaces.	2026-02-10 16:04:05 +05:30
Kerry McLaughlin	e043195ef4	[AArch64] Add support for intent to read prefetch intrinsic (#179709 ) This patch adds support in Clang for the PRFM IR instruction, by adding the following builtin: void __pldir(void const *addr); This builtin is described in the following ACLE proposal: https://github.com/ARM-software/acle/pull/406	2026-02-10 10:12:52 +00:00
Matt Arsenault	302ff8fd00	InstCombine: Use SimplifyDemandedFPClass on fmul (#177490 ) Start trying to use SimplifyDemandedFPClass on instructions, starting with fmul. This subsumes the old transform on multiply of 0. The main change is the introduction of nnan/ninf. I do not think anywhere was systematically trying to introduce fast math flags before, though a few odd transforms would set them. Previously we only called SimplifyDemandedFPClass on function returns with nofpclass annotations. Start following the pattern of SimplifyDemandedBits, where this will be called from relevant root instructions. I was wondering if this should go into InstCombineAggressive, but that apparently does not make use of InstCombineInternal's worklist.	2026-02-10 09:49:31 +00:00
Benjamin Maxwell	b91eb9b4e5	[SDAG] Implement missing legalization for `ISD::VECTOR_FIND_LAST_ACTIVE` (#180290 ) This lowers the splitting as: ``` any_active(hi_mask) ? (find_last_active(hi_mask) + lo_mask.getVectorElementCount()) : find_last_active(lo_mask) ``` And trivially lowers `<1 x i1>` scalarization to returning zero. Which is a natural result of the splitting (and the lack of a sentinel "none-active" result value). The lowerings likely can be improved. This patch is for completeness. Should fix: https://github.com/llvm/llvm-project/pull/178862#issuecomment-3862310334 Fixes #180212	2026-02-10 09:01:13 +00:00
Diana Picus	24405f070f	[AMDGPU] Add intrinsic exposing s_alloc_vgpr (#163951 ) Make it possible to use `s_alloc_vgpr` at the IR level. This is a huge footgun and use for anything other than compiler internal purposes is heavily discouraged. The calling code must make sure that it does not allocate fewer VGPRs than necessary - the intrinsic is NOT a request to the backend to limit the number of VGPRs it uses (in essence it's not so different from what we do with the dynamic VGPR flags of the `amdgcn.cs.chain` intrinsic, it just makes it possible to use this functionality in other scenarios).	2026-02-10 09:28:31 +01:00
Pengcheng Wang	8c5f31b365	[RISCV] Enable select optimization by default (#178394 ) And we add `TuneEnableSelectOptimize` to: * `generic` * `generic-ooo` * `sifive-p550` * `spacemit-x60`	2026-02-10 16:19:01 +08:00
Kyungtak Woo	a56b877056	[NewPM] Port x86-global-base-reg (#180119 ) Had to move X86GlobalBaseRegPass to its own file like in https://github.com/llvm/llvm-project/pull/179864 No test coverage added for now as there are no MIR->MIR tests exercising this pass and we do not have enough ported to run any end to end tests. This is a redo of https://github.com/llvm/llvm-project/pull/180070	2026-02-09 22:54:41 -08:00
Craig Topper	f33ea53451	[RISCV] Remove redundant czero in multi-word comparisons (#180485 ) When comparing multi-word integers with Zicond, we generate: (or (czero_eqz (lo1 < lo2), (hi1 == hi2)), (czero_nez (hi1 < hi2), (hi1 == hi2))) The czero_nez is redundant because when hi1 == hi2 is true, hi1 < hi2 is already 0. This patch adds a DAG combine to recognize: czero_nez (setcc X, Y, CC), (setcc X, Y, eq) -> (setcc X, Y, CC) when CC is a strict inequality (lt, gt, ult, ugt). This saves one instruction in 128-bit comparisons on RV64 with Zicond. Note the czero_nez becomes a czero.eqz in the final assembly because the seteq is replaced by an xor that produces 0 when the values are equal. Part of #179584 Assisted-by: claude	2026-02-09 21:48:14 -08:00
vangthao95	8d8864237b	AMDGPU/GlobalISel: Regbanklegalize rules for G_FSQRT (#179817 ) Add S16 rules for G_FSQRT. S32 and S64 are expanded by the legalizer.	2026-02-09 18:24:28 -08:00
Lleu Yang	8fc59bc0e3	[SPIRV] Add handling for `uinc_wrap` and `udec_wrap` atomics (#179114 ) This adds atomicrmw `uinc_wrap` and `udec_wrap` operations support for SPIR-V. Since SPIR-V doesn't provide dedicated instructions for those two operations, we have to use the `AtomicExpand` pass to expand the operations into CAS forms. Closes #177204.	2026-02-10 01:39:05 +01:00
Dmitry Sidorov	19705bd7fc	[SPIR-V] Emit ceil(Bitwidth / 32) words during OpConstant creation (#180218 ) Fixes error of handing constant integers with width in (64; 128) range. Found during review of https://github.com/llvm/llvm-project/pull/180182	2026-02-10 00:40:35 +01:00
Daniel Paoliello	853a39043e	[win][aarch64] The Windows Control Flow Guard Check function also preserves X15 (#179738 ) The target function to be checked by the Control Flow Guard Check function is stored in `X15` on AArch64. This register is guaranteed to be preserved by that function (on success), thus after it returns `X15` can be used to branch to the target function instead of having to load it from another register or the stack.	2026-02-09 15:35:20 -08:00
Ryan Buchner	d69ccf3b34	[RISCV] Combine shuffle of shuffles to a single shuffle (#178095 ) Compressing to a single shuffle doesn't remove any information and the backend can better apply specific optimizations to a single shuffle. Addresses #176218. --------- Co-authored-by: Luke Lau <luke_lau@igalia.com>	2026-02-09 14:48:31 -08:00
Steven Perron	e1d2ff6caf	[SPIRV] Implement lowering for HLSL Texture2D sampling intrinsics (#179312 ) This patch implements the SPIR-V lowering for the following HLSL intrinsics: - SampleBias - SampleGrad - SampleLevel - SampleCmp - SampleCmpLevelZero It defines the required LLVM intrinsics in 'IntrinsicsDirectX.td' and 'IntrinsicsSPIRV.td'. It updates 'SPIRVInstructionSelector.cpp' to handle the new intrinsics and generates the correct 'OpImageSample*' instructions with the required operands (Bias, Grad, Lod, ConstOffset, MinLod, etc.). CodeGen tests are added to verify the implementation for images with dimension 1D, 2D, 3D, and Cube. Assisted-by: Gemini	2026-02-09 17:28:58 -05:00
Alexey Merzlyakov	4136d3f248	[AArch64] Inline asm v0-v31 are scalar when having less than 64-bit capacity (#169930 ) If 32-bit (or less) "v0" registers coming from inline asm are treated as vector ones, codegen might produce incorrect vector<->scalar conversions. This causes types mismatch assertion failures later during compile-time. The fix treats 32-bit or less v0-v31 AArch64 registers as scalar, along with 64-bit ones. Fixes #153442	2026-02-09 13:26:31 -08:00
Gheorghe-Teodor Bercea	d1dc843c18	[AMDGPU] Enable sinking of free vector ops that will be folded into their uses (#162580 ) Sinking ShuffleVectors / ExtractElement / InsertElement into user blocks can help enable SDAG combines by providing visibility to the values instead of emitting CopyTo/FromRegs. The sink IR pass disables sinking into loops, so this PR extends the CodeGenPrepare target hook shouldSinkOperands. Co-authored-by: Jeffrey Byrnes <Jeffrey.Byrnes@amd.com> --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2026-02-09 14:14:31 -05:00
John Brawn	77cb666078	[AArch64] Add support for B and H loads/stores in LoadStoreOptimizer (#180535 ) This means the load/store optimizer can generate pre and post increment versions of these instructions.	2026-02-09 17:48:31 +00:00
vangthao95	404f9e6c99	AMDGPU/GlobalISel: RegBankLegalize rules for amdgcn_sffbh (#180099 ) Change test to use update_llc_test_checks.py and make `v_flbit` test actually divergent.	2026-02-09 09:18:03 -08:00
vangthao95	0040bdf532	AMDGPU/GlobalISel: Regbanklegalize rules for buffer atomic swap (#180265 )	2026-02-09 09:04:17 -08:00
Ryan Mitchell	8bbdac9e52	[MIParser] - Add support for MMRAs (#180320 ) Probably just forgotten in #78569	2026-02-09 18:01:02 +01:00
Craig Topper	e6a72a1d42	[RISCV] Combine ADDD+WMULSU to WMACCSU (#180454 ) Extend the existing combineADDDToWMACC DAG combine to also match RISCVISD::WMULSU and produce RISCVISD::WMACCSU. This is similar to how ADDD+UMUL_LOHI is combined to WMACCU and ADDD+SMUL_LOHI is combined to WMACC. This patch was generated by AI, but I reviewed it.	2026-02-09 08:51:27 -08:00
Simon Pilgrim	a911fc12ec	[X86] Fold expand(splat,passthrough,mask) -> select(splat,passthrough,mask) (#180238 ) If all elements of the expansion vector are already splatted in place then we can use a vselect directly	2026-02-09 16:15:41 +00:00
Nathan Gauër	091972c354	[SPIR-V] initial support for @llvm.structured.gep (#178668 ) This commit adds initial support to lower the intrinsinc `@llvm.structured.gep` into proper SPIR-V. For now, the backend continues to support both GEP formats. We might want to revisit this at some point for the logical part.	2026-02-09 15:51:07 +00:00
Anshil Gandhi	ab2e10d80f	[AMDGPU] Add legalization rules for G_ATOMICRMW_FADD (#175257 ) G_ATOMICRMW_FADD is supported on flat, global and local address spaces for S32, S64 and V2S16 values.	2026-02-09 15:37:27 +00:00
Shilei Tian	65b4099219	[AMDGPU] Fix instruction size for 64-bit literal constant operands (#180387 ) `getLit64Encoding` uses a different approach to determine whether 64-bit literal encoding is used, which caused a size mismatch between the `MachineInstr` and the `MCInst`. For `!isValid32BitLiteral`, it is effectively `!(isInt<32>(Val) \|\| isUInt<32>(Val))`, which is `!isInt<32>(Val) && !isUInt<32>(Val)`, but in `getLit64Encoding`, it is `!isInt<32>(Val) \|\| !isUInt<32>(Val)`.	2026-02-09 14:31:52 +00:00
Dmitry Sidorov	f6ee5bd4df	[SPIRV] Fix constant materialization for width > 64bit (#180182 ) selectConst() was asserting for constants wider than 64 bits. Add APInt overloads of getOrCreateConstInt and getOrCreateConstVector that avoid the uint64_t truncation.	2026-02-09 15:05:23 +01:00
Shilei Tian	392f0c9767	[NFC][AMDGPU] Add a test to show the impact of wrong `s_mov_b64` instruction size (#180386 )	2026-02-09 08:56:28 -05:00
Djordje Todorovic	b2e6b98783	[MIPS] Fix argument size in Fast ISel (#180336 ) Fix bug where Fast ISel incorrectly set `IncomingArgSize` to `0` for functions with no arguments, since `MIPS O32` uses _the reserved argument area_ of 16 bytes even for the functions with no args at all.	2026-02-09 21:36:35 +08:00
Mirko Brkušanin	45b037cf7a	[AMDGPU] Add fp8/bf8 conversion instructions for gfx1170 (#180191 )	2026-02-09 13:56:43 +01:00
Simon Pilgrim	964651ad51	[X86] Allow handling of i128/256/512 SELECT on the FPU (#180197 ) If the scalar integer selection sources are freely transferable to the FPU, then splat to create an allbits select condition and create a vector select instead	2026-02-09 10:34:02 +00:00
Petr Kurapov	27a8ab09fa	[AMDGPU] Fix V_INDIRECT_REG_READ_GPR_IDX expansion with immediate index (#179699 ) The definition for V_INDIRECT_REG_READ_GPR_IDX_B32_V*'s SSrc_b32 operand allows immediates, but the expansion logic handles only register cases now. This can result in expansion failures when e.g. llvm.amdgcn.wave.reduce.umin.i32 is folded into a constant and then used as an insertelement idx.	2026-02-09 11:33:30 +01:00
Matt Arsenault	2ffb54364f	AMDGPU: Add a test for libcall simplify pow handling (#180491 ) This case could be turned into powr or pown, so track which case ends up preferred.	2026-02-09 10:01:26 +00:00
Gergo Stomfai	2298b8606d	[GISel] computeKnownBits - add CTLS handling (#178063 ) Closes llvm/llvm-project#174370	2026-02-09 09:30:45 +00:00
Pierre van Houtryve	b79ba02479	[AMDGPU][GFX12.5] Reimplement monitor load as an atomic operation (#177343 ) Load monitor operations make more sense as atomic operations, as non-atomic operations cannot be used for inter-thread communication w/o additional synchronization. The previous built-in made it work because one could just override the CPol bits, but that bypasses the memory model and forces the user to learn about ISA bits encoding. Making load monitor an atomic operation has a couple of advantages. First, the memory model foundation for it is stronger. We just lean on the existing rules for atomic operations. Second, the CPol bits are abstracted away from the user, which avoids leaking ISA details into the API. This patch also adds supporting memory model and intrinsics documentation to AMDGPUUsage. Solves SWDEV-516398.	2026-02-09 09:57:27 +01:00
Matt Arsenault	8554ed738f	AMDGPU: Add syntax for s_wait_event values (#180272 ) Previously this would just print hex values. Print names for the recognized values, matching the sp3 syntax.	2026-02-09 08:29:55 +00:00
Matt Arsenault	0c583e784e	AMDGPU: Add llvm.amdgcn.s.wait.event intrinsic (#180170 ) Exactly match the s_wait_event instruction. For some reason we already had this instruction used through llvm.amdgcn.s.wait.event.export.ready, but that hardcodes a specific value. This should really be a bitmask that can combine multiple wait types. gfx11 -> gfx12 broke compatabilty in a weird way, by inverting the interpretation of the bit but also shifting the used bit by 1. Simplify the selection of the old intrinsic by just using the magic number 2, which should satisfy both cases.	2026-02-09 08:45:13 +01:00
Pengcheng Wang	972e73b812	[RISCV][CodeGen] Lower `ISD::ABS` to Zvabd instructions We add pseudos/patterns for `vabs.v` instruction and handle the lowering in `RISCVTargetLowering::lowerABS`. Reviewers: topperc, 4vtomat, mshockwave, preames, lukel97, tclin914 Reviewed By: mshockwave Pull Request: https://github.com/llvm/llvm-project/pull/180142	2026-02-09 15:21:25 +08:00
Pengcheng Wang	e992593341	[RISCV][CodeGen] Lower `abds`/`abdu` to `Zvabd` instructions We directly lower `ISD::ABDS`/`ISD::ABDU` to `Zvabd` instructions. Note that we only support SEW=8/16 for `vabd.vv`/`vabdu.vv`. Reviewers: mshockwave, lukel97, topperc, preames, tclin914, 4vtomat Reviewed By: lukel97, topperc Pull Request: https://github.com/llvm/llvm-project/pull/180141	2026-02-09 15:12:22 +08:00
Pengcheng Wang	151fadecd1	[RISCV][MC] Support experimental Zvabd instructions The `Zvabd` is for `RISC-V Integer Vector Absolute Difference` and it provides 5 instructions: * `vabs.v`: Vector Signed Integer Absolute. * `vabd.vv`: Vector Signed Integer Absolute Difference. * `vabdu.vv`: Vector Unsigned Integer Absolute Difference. * `vwabda.vv`: Vector Signed Integer Absolute Difference And Accumulate. * `vwabdau.vv`: Vector Unsigned Integer Absolute Difference And Accumulate. Doc: https://github.com/riscv/integer-vector-absolute-difference Reviewers: topperc, lukel97, preames, tclin914, asb, kito-cheng, mshockwave Pull Request: https://github.com/llvm/llvm-project/pull/180139	2026-02-09 14:18:39 +08:00
JaydeepChauhan14	fad32ff3ea	[X86] Optimized ADC + ADD to ADC (#176713 )	2026-02-09 11:43:56 +05:30
Jim Lin	84b5e9f8db	[RISCV] Add used callee-saved registers as implicit/implicit-def registers to save/restore call (#180133 ) We should add used callee-saved registers as implicit used to save libcall and as implicit defined to restore libcall. It likes what we did for CM_PUSH/CM_POPRET. That can help to construct correct dataflow. In entry bb, save libcall implicitly uses the callee-saved registers which live in. And in return bb, restore libcall implicitly defines the callee-saved registers which live out.	2026-02-09 10:00:32 +08:00
paperchalice	c53acf0443	[SelectionDAGBuilder] Remove NoNaNsFPMath uses (#169904 ) Replaced by checking fast-math flags or value tracking results.	2026-02-09 09:48:07 +08:00
paperchalice	5c5677d7b8	[llvm] Remove "no-infs-fp-math" attribute support (#180083 ) One of global options in `TargetMachine::resetTargetOptions`, now all backends no longer support it, remove it.	2026-02-09 08:43:33 +08:00
Craig Topper	769b734c02	[RISCV] Combine ADDD with UMUL_LOHI/SMUL_LOHI into WMACCU/WMACC (#180383 ) Combine the pattern: ADDD(addlo, addhi, UMUL_LOHI(x, y).0, UMUL_LOHI(x, y).1) into: WMACCU(x, y, addlo, addhi) And similarly for SMUL_LOHI -> WMACC. This patch was written with AI, but I reviewed it carefully.	2026-02-08 13:39:32 -08:00
Craig Topper	a563e6bb7e	[RISCV] Add support for forming WMULSU during type legalization. (#180331 ) Add a DAG combine to turn it into MULHSU if the lower half result is unused.	2026-02-08 12:38:56 -08:00
Simon Pilgrim	ed9c18693b	[MIPS] musttail.ll - regenerate test checks (#180423 )	2026-02-08 17:41:30 +00:00

1 2 3 4 5 ...

63532 Commits