llvm-project

Author	SHA1	Message	Date
Ricardo Jesus	e8391d4619	[AArch64] Set default schedule of load-acquire RCpc instructions. (#172881 ) This patch sets the default schedule of RCpc load-acquires to WriteLD, same as it's done for rcpc-immo load-acquires.	2025-12-19 09:25:34 +00:00
Matt Arsenault	0db4393762	AMDGPU: Add baseline tests for f64 rsq pattern handling (#172052 )	2025-12-19 10:12:25 +01:00
Justin Bogner	b324c9f4fa	[DirectX] Move memset and memcpy handling to a new pass. NFC (#172921 ) This introduces the DXILMemIntrinsics pass and moves memset and memcpy handling from DXILLegalize to here. We need to do this so that we can handle memory intrinsics before the DXILResourceAccess pass so that we can properly deal with arrays and large structures in resources.	2025-12-18 22:08:43 -07:00
Brandon Wu	0e03199e81	[RISCV][llvm] Remove custom legalization of fixed-length vector SPLAT_VECTOR (#172870 ) BUILD_VECTOR is combined to SPLAT_VECTOR if operation action of SPLAT_VECTOR is not Expand. However we already have custom handle of BUILD_VECTOR for fixed-length vector which has explicit constant VL instead of making it VLMAX if lowered through SPLAT_VECTOR.	2025-12-19 11:45:10 +08:00
Alex MacLean	a40f444265	[NVPTX] Add support for barrier.cta.red.* instructions (#172541 ) This change adds full support for the ptx `barrier.cta.red` instruction, following the same conventions as are already used for `barrier.cta.sync` and `barrier.cta.arrive`. In addition this MR removes the following intrinsics which are no longer needed: * llvm.nvvm.barrier0.popc --> llvm.nvvm.barrier.cta.red.popc.aligned.all(0, c) * llvm.nvvm.barrier0.and --> llvm.nvvm.barrier.cta.red.and.aligned.all(0, z) * llvm.nvvm.barrier0.or --> llvm.nvvm.barrier.cta.red.or.aligned.all(0, z)	2025-12-18 18:06:27 -08:00
KRM7	c9aea6248a	[RegisterCoalescer] Don't commute two-address instructions which only define a subregister (#169031 ) Currently, the register coalescer may try to commute an instruction like: ``` %0.sub_lo32:gpr64 = AND %0.sub_lo32:gpr64(tied-def 0), %1.sub_lo32:gpr64 USE %0:gpr64 ``` resulting in: ``` %1.sub_lo32:gpr64 = AND %1.sub_lo32:gpr64(tied-def 0), %0.sub_lo32:gpr64 USE %1:gpr64 ``` However, this is not correct if the instruction doesn't define the entire register, as the value of the upper 32-bits of the register used in `USE` will not be the same.	2025-12-18 23:24:44 +01:00
Harald van Dijk	a9b62e8324	[AArch64] Make IFUNC opt-in rather than opt-out. (#171648 ) IFUNCs require loader support, so for arbitrary environments, the safe assumption is to assume that they are not supported. In particular, aarch64-linux-pauthtest may be used with musl, and was wrongly detected as supporting IFUNCs. With IFUNC support now being detected more reliably, this also removes the check for PAuth support. If both are supported, either would work.	2025-12-18 22:17:07 +00:00
Justin Bogner	c3039a7dc5	[DirectX] Avoid precalculating GEPs in DXILResourceAccess (#172720 ) Instead of trying to precalculate GEP offsets ahead of time and then process resource accesses based off of these offsets, traverse the GEP chain inline for each access. This makes it easier to get the types correct when translating GEPs for cbuffer and structured buffer accesses, which in turn lets us access individual elements of those structures directly. Fixes #160208, #164517, and #169430	2025-12-18 22:15:12 +00:00
Erik Enikeev	4cbaa40f70	[mips][micromips] Add mayRaiseFPException to appropriate instructions, mark all instructions that read FCSR (FCR31) rounding bits as doing so (#170322 )	2025-12-18 23:06:36 +01:00
vangthao95	031e9c989e	[AMDGPU][GlobalISel] Add RegBankLegalize support for G_FPTRUNC (#171723 )	2025-12-18 13:16:46 -08:00
Steven Perron	7486c6987e	[SPIRV] Restrict OpName generation to major values (#171886 ) Refines OpName emission to only target Global Variables, Functions, Function Parameters, Local Variables (allocas/phis), and Basic Blocks. This reduces binary size and clutter by avoiding OpName for every intermediate instruction (arithmetic, casts, etc.), while preserving readability for interfaces and program structure. Also updates the test suite to align with this change: - Removes OpName checks for intermediate instructions. - Adds side-effects (e.g., volatile stores) to tests where instructions were previously kept alive solely by their OpName usage. - Updates checks to use generic ID matching where specific names are no longer available. - Adds debug-info/opname-filtering.ll to verify the new policy.	2025-12-18 19:46:59 +00:00
Arthur Eubanks	4204244301	[X86] Fix sext optimization accidentally applying to large code model (#172721 ) Otherwise we were seeing "unsupported relocation" errors when referencing a small symbol under the large code model. This regresses some cases where a large function references a small global (e.g. relocimm-code-model.ll), but that's probably not super important.	2025-12-18 18:39:13 +00:00
Gaëtan Bossu	ef58e6f6af	[SDAG] Widen TRUNCATE to intermediate type to avoid ISel failure (#172473 ) SelectionDAG offered no way to widen TRUNCATE for pathological types like <vscale x 1 x ...> as they do not allow scalarisation. One way to go further to is widen to an intermediate type which will allow to promote the element type in a later run of legalisation.	2025-12-18 17:19:34 +00:00
vangthao95	55089733b6	[AMDGPU][GlobalISel] Add readanylane combines for merge-like instruct… (#172546 ) …ions When a merge-like instruction has all readanylane sources and the result is copied to VGPRs, eliminate the readanylanes by either using the original unmerge source directly or building a new merge with the VGPR sources.	2025-12-18 08:04:06 -08:00
Sudharsan Veeravalli	3bf0a8d6e1	[RISCV] Add Xqci feature flag (#172608 ) This patch adds an experimental Xqci feature flag that covers all the sub-extensions in the Qualcomm uC Extension.	2025-12-18 21:32:49 +05:30
Simon Pilgrim	50bda7296b	[X86] combineConcatVectorOps - add handling for SITOFP vector ops (#172866 )	2025-12-18 15:44:16 +00:00
Craig Topper	a256c03206	[RISCV] Rename -enable-p-ext-codegen -riscv-enable-p-ext-simd-codegen. (#172790 ) Make it clear this only applies to SIMD code and that it belongs to RISC-V.	2025-12-18 07:11:16 -08:00
Min-Yih Hsu	e742015f43	[RISCV] Assign separate latencies for vector COPYs in SpacemitX60 scheduling model (#172556 ) Currently, we assign the same scheduling info to COPY regardless of whether it's a scalar or vector one. But this might cause vector COPY from physical registers to schedule too closed to its consumer, prolonging the physical register live range and running out of registers during RA as seen in #167008 . This patch addresses this issue by creating schedule variants for COPY instructions of vector register classes so that they can have the same latency as simple vector arithmetics (WriteVIALUV). It is worth noting that we _only_ need latency in this case -- keeping processor resources in (vector) COPYs still causes the aforementioned register shortage issue, because these COPY might then be blocked by structural hazards and again, got sunk further down than we want.	2025-12-18 07:04:42 -08:00
macurtis-amd	e741cd88a1	AMDGPU/PromoteAlloca: Fix handling of users of multiple allocas (#172771 ) With recent refactoring, LDS promotion worklists for all allocas are populated upfront. In some cases, this results in a User in multiple lists. Then as each list is processed, a User might get deleted via removeFromParent, potentially leaving a dangling pointer in a subsequent worklist. Currently this only occurs for memcpy and memmove. Prior to refactoring, these were handled by DeferredInstr, and were processed after the last use of the then singular worklist. This change moves processing of DeferredInstr to after all worklists have be processed.	2025-12-18 08:41:21 -06:00
guan jian	4e675a0c45	[SelectionDAG] Lowering usub.sat(a, 1) to a - (a != 0) (#170076 ) I recently observed that LLVM generates the following code: ``` addi a1, a0, -1 sltu a0, a0, a1 addi a0, a0, -1 and a0, a0, a1 ret ``` This could be optimized using the snez instruction instead.	2025-12-18 14:31:53 +00:00
Simon Pilgrim	345d763986	[X86] Add tests showing failure to concat matching SITOFP/UITOFP vector ops (#172852 ) Tests have to perform an additional FADD to prevent combineConcatVectorOfCasts from performing the fold - we're trying to show when this fails to occur during a combineConcatVectorOps recursion Interestingly, due to uitofp expansion AVX1/2 is often managing to concat where AVX512 can't	2025-12-18 14:28:12 +00:00
Benjamin Maxwell	492ca62e2c	[AArch64][SVE] Generalize extract_elt => plast fold to i32 indices (#172692 ) This occurs after type legalization, so the index type can be i32 or i64. This patch simplifies the matching and checks for the optional zero extend. Also, a few tests from when this fold was added had broken due to incorrectly adding `nuw` to the `add <eltCount>, #-1`, which this patch corrects.	2025-12-18 14:15:20 +00:00
Simon Pilgrim	cd7c511cc0	[X86] combineConcatVectorOps - add handling for CVTPS2DQ/CVTTPS2DQ vector ops (#172841 )	2025-12-18 12:52:11 +00:00
Paul Walker	cba7bb9d2f	[LLVM][CodeGen][X86] Make printConstant's output for vector ConstantFP match that of ConstantVector. (#172679 )	2025-12-18 11:58:05 +00:00
Simon Pilgrim	5f84dfff53	[X86] Add tests showing failure to concat matching CVTPS2DQ/CVTTPS2DQ vector ops (#172836 )	2025-12-18 11:55:21 +00:00
Frederik Harwath	5c05824d2b	[CodeGen] Rename expand-fp to expand-ir-insts (#172681 ) The pass now contains a non-fp expansion and should be used for any similar expansions regardless of the types involved. Hence a generic name seems apt. Rename the source files, pass, and adjust the pass description. Move all tests for the expansions that have previously been merged into the pass to a single directory.	2025-12-18 11:15:04 +00:00
Matt Arsenault	d6f159dd05	AMDGPU: Add pattern for copysign of 0 (#172699 ) Avoiding v_bfi_b32 is desirable since on gfx9 it requires materializing the constant. Similar could be done for infinity, with or 0x7fffffff	2025-12-18 11:34:24 +01:00
Nathan Gauër	8cfda79105	[HLSL][SPIR-V] Implement vk::push_constant (#166793 ) Implements initial support for vk::push_constant. As is, this allows handling simple push constants, but has one main issue: layout can be incorrect (See #168401). The layout issue being not only push-constant related, it's ignored for this PR. The frontend part of the implementation is straightforward: - adding a new attribute - when targeting vulkan/spirv, we process it - global variables with this attribute gets a new AS: hlsl_push_constant The IR has nothing specific, only some RO globals in this new AS. On the SPIR-V side, we not convert this AS into a PushConstant storage class. But this creates some issues: the variables in this storage class must have a specific set of decoration to define their layout. Current infra to create the SPIR-V types lacks the context required to make this decision: no indication on the AS or context around the type being created. Refactoring this would be a heavy task as it would require getting this information in every place using the GR for type creation. Instead, we do something similar to CBuffers: - find all globals with this address space, and change their type to a target-specific type. - insert a new intrinsic in place of every reference to this global variable. This allow the backend to handle both layout variables loads and type lowering independently. Type lowering has nothing specific: when we encounter a target extension type with spirv.PushConstant, we lower this to the correct SPIR-V type with the proper offset & block decorations. As for the intrinsic, it's mostly a no-op, but required since we have this target-specific type. Note: this implementation prevents the static declaration of multiple push constants in a single shader module. The actual specification is more relaxed: there can be only one used push constant block per entrypoint. To correctly implement this, we'd require to keep some additional state to determine the list of statically used resources per entrypoint. This shall be addressed as a follow-up (see #170310)	2025-12-18 11:01:11 +01:00
Benjamin Maxwell	c8bf963282	[AArch64][SVE] Rework VECTOR_COMPRESS lowering (#171162 ) This removes the use of `LowerVECTOR_COMPRESS` in `ReplaceNodeResults` (which was used to promote illegal integer VTs), and instead only marks the legal VTs as "Custom" (allowing for standard type legalization). This patch also simplifies the lowering by using the existing fixed-length <-> SVE conversion helpers. This was intended to be an NFC, but it appears to have caused some minor code-gen changes/improvements.	2025-12-18 09:34:17 +00:00
Kevin Per	98b82f90df	[PowerPC]: Add check for cast when shufflevector (#172443 ) The crash happens because the cast for `Mask = cast<ShuffleVectorSDNode>(Res)->getMask();` fails for node `t197: v16i8 = vector_shuffle<16,17,18,19,4,5,6,7,8,9,10,11,u,u,u,u> t196, t196`. However, both `LHS` and `RHS` are the same node, so `DAG.getCommutedVectorShuffle` doesn't return a `ShuffleVectorSDNode` and crashes. The fix is to add a check before the cast is performed. Closes https://github.com/llvm/llvm-project/issues/172265	2025-12-18 17:14:01 +08:00
Frederik Harwath	71760f324f	[CodeGen] Merge ExpandLargeDivRem into ExpandFp (#172680 ) Both passes expand instructions at the IR level. They use the same kind of instruction visitation logic and contain significant code duplication e.g. for scalarization.	2025-12-18 09:22:47 +01:00
Craig Topper	50ea2d8551	[RISCV] Extract vector from passthru when combining tuple_extract+vlseg. (#172743 ) The passthru operand is a tuple. We need to extract the correct field vector from it. Existing tests only handled the undef passthru case which accidentally worked. Possibly due to IMPLICIT_DEF being converted to noreg. Fixes #172628.	2025-12-17 22:45:47 -08:00
WANG Rui	8e648380a1	[LoongArch][NFC] Add tests for issue #172154	2025-12-18 14:26:16 +08:00
Craig Topper	6d405d6b5e	[RISCV] Replace enablePExtCodeGen with hasStdExtP for scalar code in RISCVISelDAGToDAG.cpp (#172785 ) The enablePExtCodeGen was only intended to block vector code while it is still in development. This code uses scalar types so we only need to check for the extension.	2025-12-17 22:22:05 -08:00
WANG Rui	55ff003344	[LoongArch][NFC] Partial revert "Custom lowering for vector logical right shifts of integers" This reverts commit a108881b24ecfea8d194b33dd9fb211943065bca, except for the tests.	2025-12-18 14:15:33 +08:00
Brandon Wu	1e90a273fe	[RISCV][llvm] Support fminimum, fmaximum, fminnum, fmaxnum, fminimumnum, fmaximumnum codegen for zvfbfa (#171794 ) This patch supports for both scalable vector and fixed-length vector. It also enables fsetcc pattern match for zvfbfa to make fminimum and fmaximum work correctly.	2025-12-18 14:12:04 +08:00
quic_hchandel	8a0cdb88f9	[RISCV] Add short forward branch support for `qc.e.lb(u)`, `qc.e.lh(u)` and `qc.e.lw` (#172629 )	2025-12-18 09:38:31 +05:30
Craig Topper	cd75676928	[RISCV] Prefer li over pli in RISCVMatInt. (#172778 ) li is compressible, pli is not.	2025-12-17 19:35:38 -08:00
hev	457f93d448	[LoongArch] Fix OptimizeW crash when MI operand is not a virtual register (#172604 ) Fixes #172600	2025-12-18 09:40:39 +08:00
Craig Topper	94e03a7894	[RISCV] Enable use of PACK in RISCVMatInt with P extension. (#172760 )	2025-12-17 17:32:04 -08:00
Mingjie Xu	796fafeff9	[IR] Update `PHINode::removeIncomingValueIf()` to use the swap strategy like `PHINode::removeIncomingValue()` (#172639 ) As suggested in https://github.com/llvm/llvm-project/pull/171963, update `PHINode::removeIncomingValueIf()` to use the swap strategy too.	2025-12-18 09:09:50 +08:00
Kevin Per	0036c67445	[RISCV]: Implemented softening of `FCANONICALIZE` (#169234 ) The `ISD::FCANONICALIZE` is mapped to `llvm.minnum(x, x)`. Closes https://github.com/llvm/llvm-project/issues/169216	2025-12-17 16:38:18 -08:00
Rahman Lavaee	53005fd435	Use the Propeller CFG profile in the PGO analysis map if it is available. (#163252 ) This PR implements the emitting of the post-link CFG information in PGO analysis map, as explained in the [RFC](https://discourse.llvm.org/t/rfc-extending-the-pgo-analysis-map-with-propeller-cfg-frequencies/88617). This is enabled by a flag `pgo-analysis-map-emit-bb-sections-cfg`. This PR bumps the SHT_LLVM_BB_ADDR_MAP version to 5. Also includes some refactoring changes related to storing the CFG in the Basic block sections profile reader.	2025-12-17 14:19:18 -08:00
Valeriy Savchenko	e7892d702f	[DAGCombiner] Fix assertion failure in vector division lowering (#172321 )	2025-12-17 22:09:54 +00:00
Folkert de Vries	a587ccd87d	fix `llvm.fma.f16` double rounding issue when there is no native support (#171904 ) fixes https://github.com/llvm/llvm-project/issues/98389 As the issue describes, promoting `llvm.fma.f16` to `llvm.fma.f32` does not work, because there is not enough precision to handle the repeated rounding. `f64` does have sufficient space. So this PR explicitly promotes the 16-bit fma to a 64-bit fma. I could not find examples of a libcall being used for fma, but that's something that could be looked in separately to work around code size issues.	2025-12-17 22:03:01 +01:00
Pan Tao	b6bfa85686	[aarch64] Mix the frame pointer with the stack cookie when protecting the stack (#161114 ) This strengthens the guard and matches MSVC. Fixes #156573 .	2025-12-17 12:52:28 -08:00
Yonah Goldberg	f09f578c0d	[NVPTX][DagCombiner] Eliminate guards on shift amount because PTX shifts automatically clamp (#172431 ) Transform patterns like: `(select (ugt shift, BitWidth-1), 0, (srl/shl x, shift))` `(select (ult shift, BitWidth), (srl/shl x, shift), 0)` Into: `(srl/shl x, shift)` These patterns arise from C/C++ code like shift >= 32 ? 0 : x >> shift which guards against undefined behavior. PTX shr/shl instructions clamp shift amounts >= BitWidth to produce 0 for logical shifts, making the guard redundant.	2025-12-17 12:13:36 -08:00
Matt Arsenault	399b33086f	AMDGPU: Add baseline tests for fcopysign with 0 magnitude (#172698 )	2025-12-17 20:22:52 +01:00
Steven Perron	e2d21b2eb8	[SPIR-V] Legalize vector arithmetic and intrinsics for large vectors (#170668 ) This patch improves the legalization of vector operations, particularly focusing on vectors that exceed the maximum supported size (e.g., 4 elements for shaders). This includes better handling for insert and extract element operations, which facilitates the legalization of loads and stores for long vectors—a common pattern when compiling HLSL matrices with Clang. Key changes include: - Adding legalization rules for G_FMA, G_INSERT_VECTOR_ELT, and various arithmetic operations to handle splitting of large vectors. - Updating G_CONCAT_VECTORS and G_SPLAT_VECTOR to be legal for allowed types. - Implementing custom legalization for G_INSERT_VECTOR_ELT using the spv_insertelt intrinsic. - Enhancing SPIRVPostLegalizer to deduce types for arithmetic instructions and vector element intrinsics (spv_insertelt, spv_extractelt). - Refactoring legalizeIntrinsic to uniformly handle vector legalization requirements. The strategy for insert and extract operations mirrors that of bitcasts: incoming intrinsics are converted to generic MIR instructions (G_INSERT_VECTOR_ELT and G_EXTRACT_VECTOR_ELT) to leverage standard legalization rules (like splitting). After legalization, they are converted back to their respective SPIR-V intrinsics (spv_insertelt, spv_extractelt) because later passes in the backend expect these intrinsics rather than the generic instructions. This ensures that operations on large vectors (e.g., <16 x float>) are correctly broken down into legal sub-vectors.	2025-12-17 13:00:49 -05:00
natanelh-mobileye	fa78d6a5f1	[SDAG] Shrink (abd? (?ext x) (?ext y)) (#171865 ) Alive2 test: https://alive2.llvm.org/ce/z/maryYU Lit test before change: https://godbolt.org/z/nEKWdPbMv Fixes #171640	2025-12-17 16:30:52 +00:00

1 2 3 4 5 ...

62571 Commits