62571 Commits

Author SHA1 Message Date
Ricardo Jesus
e8391d4619
[AArch64] Set default schedule of load-acquire RCpc instructions. (#172881)
This patch sets the default schedule of RCpc load-acquires to WriteLD,
same as it's done for rcpc-immo load-acquires.
2025-12-19 09:25:34 +00:00
Matt Arsenault
0db4393762
AMDGPU: Add baseline tests for f64 rsq pattern handling (#172052) 2025-12-19 10:12:25 +01:00
Justin Bogner
b324c9f4fa
[DirectX] Move memset and memcpy handling to a new pass. NFC (#172921)
This introduces the DXILMemIntrinsics pass and moves memset and memcpy
handling from DXILLegalize to here. We need to do this so that we can
handle memory intrinsics before the DXILResourceAccess pass so that we
can properly deal with arrays and large structures in resources.
2025-12-18 22:08:43 -07:00
Brandon Wu
0e03199e81
[RISCV][llvm] Remove custom legalization of fixed-length vector SPLAT_VECTOR (#172870)
BUILD_VECTOR is combined to SPLAT_VECTOR if operation action of
SPLAT_VECTOR is not Expand. However we already have custom handle of
BUILD_VECTOR for fixed-length vector which has explicit constant VL
instead of making it VLMAX if lowered through SPLAT_VECTOR.
2025-12-19 11:45:10 +08:00
Alex MacLean
a40f444265
[NVPTX] Add support for barrier.cta.red.* instructions (#172541)
This change adds full support for the ptx `barrier.cta.red` instruction,
following the same conventions as are already used for
`barrier.cta.sync` and `barrier.cta.arrive`.

In addition this MR removes the following intrinsics which are no longer
needed:
* llvm.nvvm.barrier0.popc -->
  llvm.nvvm.barrier.cta.red.popc.aligned.all(0, c)
* llvm.nvvm.barrier0.and -->
  llvm.nvvm.barrier.cta.red.and.aligned.all(0, z)
* llvm.nvvm.barrier0.or -->
  llvm.nvvm.barrier.cta.red.or.aligned.all(0, z)
2025-12-18 18:06:27 -08:00
KRM7
c9aea6248a
[RegisterCoalescer] Don't commute two-address instructions which only define a subregister (#169031)
Currently, the register coalescer may try to commute an instruction
like:
```
%0.sub_lo32:gpr64 = AND %0.sub_lo32:gpr64(tied-def 0), %1.sub_lo32:gpr64
USE %0:gpr64
```
resulting in:
```
%1.sub_lo32:gpr64 = AND %1.sub_lo32:gpr64(tied-def 0), %0.sub_lo32:gpr64
USE %1:gpr64
```
However, this is not correct if the instruction doesn't define the
entire register, as the value of the upper 32-bits
of the register used in `USE` will not be the same.
2025-12-18 23:24:44 +01:00
Harald van Dijk
a9b62e8324
[AArch64] Make IFUNC opt-in rather than opt-out. (#171648)
IFUNCs require loader support, so for arbitrary environments, the safe
assumption is to assume that they are not supported. In particular,
aarch64-linux-pauthtest may be used with musl, and was wrongly detected
as supporting IFUNCs.

With IFUNC support now being detected more reliably, this also removes
the check for PAuth support. If both are supported, either would work.
2025-12-18 22:17:07 +00:00
Justin Bogner
c3039a7dc5
[DirectX] Avoid precalculating GEPs in DXILResourceAccess (#172720)
Instead of trying to precalculate GEP offsets ahead of time and then
process resource accesses based off of these offsets, traverse the GEP
chain inline for each access. This makes it easier to get the types
correct when translating GEPs for cbuffer and structured buffer
accesses, which in turn lets us access individual elements of those
structures directly.

Fixes #160208, #164517, and #169430
2025-12-18 22:15:12 +00:00
Erik Enikeev
4cbaa40f70
[mips][micromips] Add mayRaiseFPException to appropriate instructions, mark all instructions that read FCSR (FCR31) rounding bits as doing so (#170322) 2025-12-18 23:06:36 +01:00
vangthao95
031e9c989e
[AMDGPU][GlobalISel] Add RegBankLegalize support for G_FPTRUNC (#171723) 2025-12-18 13:16:46 -08:00
Steven Perron
7486c6987e
[SPIRV] Restrict OpName generation to major values (#171886)
Refines OpName emission to only target Global Variables, Functions,
Function Parameters, Local Variables (allocas/phis), and Basic Blocks.
This reduces binary size and clutter by avoiding OpName for every
intermediate instruction (arithmetic, casts, etc.), while preserving
readability for interfaces and program structure.

Also updates the test suite to align with this change:
- Removes OpName checks for intermediate instructions.
- Adds side-effects (e.g., volatile stores) to tests where instructions
  were previously kept alive solely by their OpName usage.
- Updates checks to use generic ID matching where specific names are no
  longer available.
- Adds debug-info/opname-filtering.ll to verify the new policy.
2025-12-18 19:46:59 +00:00
Arthur Eubanks
4204244301
[X86] Fix sext optimization accidentally applying to large code model (#172721)
Otherwise we were seeing "unsupported relocation" errors when
referencing a small symbol under the large code model.

This regresses some cases where a large function references a small
global (e.g. relocimm-code-model.ll), but that's probably not super
important.
2025-12-18 18:39:13 +00:00
Gaëtan Bossu
ef58e6f6af
[SDAG] Widen TRUNCATE to intermediate type to avoid ISel failure (#172473)
SelectionDAG offered no way to widen TRUNCATE for pathological types
like <vscale x 1 x ...> as they do not allow scalarisation.

One way to go further to is widen to an intermediate type which will
allow to promote the element type in a later run of legalisation.
2025-12-18 17:19:34 +00:00
vangthao95
55089733b6
[AMDGPU][GlobalISel] Add readanylane combines for merge-like instruct… (#172546)
…ions

When a merge-like instruction has all readanylane sources and the result
is copied to VGPRs, eliminate the readanylanes by either using the
original unmerge source directly or building a new merge with the VGPR
sources.
2025-12-18 08:04:06 -08:00
Sudharsan Veeravalli
3bf0a8d6e1
[RISCV] Add Xqci feature flag (#172608)
This patch adds an experimental Xqci feature flag that covers all the
sub-extensions in the Qualcomm uC Extension.
2025-12-18 21:32:49 +05:30
Simon Pilgrim
50bda7296b
[X86] combineConcatVectorOps - add handling for SITOFP vector ops (#172866) 2025-12-18 15:44:16 +00:00
Craig Topper
a256c03206
[RISCV] Rename -enable-p-ext-codegen -riscv-enable-p-ext-simd-codegen. (#172790)
Make it clear this only applies to SIMD code and that it belongs to
RISC-V.
2025-12-18 07:11:16 -08:00
Min-Yih Hsu
e742015f43
[RISCV] Assign separate latencies for vector COPYs in SpacemitX60 scheduling model (#172556)
Currently, we assign the same scheduling info to COPY regardless of
whether it's a scalar or vector one. But this might cause vector COPY
from physical registers to schedule too closed to its consumer,
prolonging the physical register live range and running out of registers
during RA as seen in #167008 .

This patch addresses this issue by creating schedule variants for COPY
instructions of vector register classes so that they can have the same
latency as simple vector arithmetics (WriteVIALUV). It is worth noting
that we _only_ need latency in this case -- keeping processor resources
in (vector) COPYs still causes the aforementioned register shortage
issue, because these COPY might then be blocked by structural hazards
and again, got sunk further down than we want.
2025-12-18 07:04:42 -08:00
macurtis-amd
e741cd88a1
AMDGPU/PromoteAlloca: Fix handling of users of multiple allocas (#172771)
With recent refactoring, LDS promotion worklists for all allocas are
populated upfront. In some cases, this results in a User in multiple
lists. Then as each list is processed, a User might get deleted via
removeFromParent, potentially leaving a dangling pointer in a subsequent
worklist.

Currently this only occurs for memcpy and memmove. Prior to refactoring,
these were handled by DeferredInstr, and were processed after the last
use of the then singular worklist.

This change moves processing of DeferredInstr to after all worklists
have be processed.
2025-12-18 08:41:21 -06:00
guan jian
4e675a0c45
[SelectionDAG] Lowering usub.sat(a, 1) to a - (a != 0) (#170076)
I recently observed that LLVM generates the following code:
```
	addi	a1, a0, -1
	sltu	a0, a0, a1
	addi	a0, a0, -1
	and	a0, a0, a1
	ret
```
This could be optimized using the snez instruction instead.
2025-12-18 14:31:53 +00:00
Simon Pilgrim
345d763986
[X86] Add tests showing failure to concat matching SITOFP/UITOFP vector ops (#172852)
Tests have to perform an additional FADD to prevent
combineConcatVectorOfCasts from performing the fold - we're trying to
show when this fails to occur during a combineConcatVectorOps recursion

Interestingly, due to uitofp expansion AVX1/2 is often managing to
concat where AVX512 can't
2025-12-18 14:28:12 +00:00
Benjamin Maxwell
492ca62e2c
[AArch64][SVE] Generalize extract_elt => plast fold to i32 indices (#172692)
This occurs after type legalization, so the index type can be i32 or
i64. This patch simplifies the matching and checks for the optional zero
extend.

Also, a few tests from when this fold was added had broken due to
incorrectly adding `nuw` to the `add <eltCount>, #-1`, which this patch
corrects.
2025-12-18 14:15:20 +00:00
Simon Pilgrim
cd7c511cc0
[X86] combineConcatVectorOps - add handling for CVTPS2DQ/CVTTPS2DQ vector ops (#172841) 2025-12-18 12:52:11 +00:00
Paul Walker
cba7bb9d2f
[LLVM][CodeGen][X86] Make printConstant's output for vector ConstantFP match that of ConstantVector. (#172679) 2025-12-18 11:58:05 +00:00
Simon Pilgrim
5f84dfff53
[X86] Add tests showing failure to concat matching CVTPS2DQ/CVTTPS2DQ vector ops (#172836) 2025-12-18 11:55:21 +00:00
Frederik Harwath
5c05824d2b
[CodeGen] Rename expand-fp to expand-ir-insts (#172681)
The pass now contains a non-fp expansion and should
be used for any similar expansions regardless of the
types involved. Hence a generic name seems apt.

Rename the source files, pass, and adjust the pass
description. Move all tests for the expansions
that have previously been merged into the pass
to a single directory.
2025-12-18 11:15:04 +00:00
Matt Arsenault
d6f159dd05
AMDGPU: Add pattern for copysign of 0 (#172699)
Avoiding v_bfi_b32 is desirable since on gfx9 it
requires materializing the constant.

Similar could be done for infinity, with or 0x7fffffff
2025-12-18 11:34:24 +01:00
Nathan Gauër
8cfda79105
[HLSL][SPIR-V] Implement vk::push_constant (#166793)
Implements initial support for vk::push_constant.
As is, this allows handling simple push constants, but has one
main issue: layout can be incorrect (See #168401). The layout
issue being not only push-constant related, it's ignored for this PR.

The frontend part of the implementation is straightforward:
 - adding a new attribute
 - when targeting vulkan/spirv, we process it
 - global variables with this attribute gets a new AS:
   hlsl_push_constant

The IR has nothing specific, only some RO globals in this new AS.

On the SPIR-V side, we not convert this AS into a PushConstant storage
class. But this creates some issues: the variables in this storage class
must have a specific set of decoration to define their layout.

Current infra to create the SPIR-V types lacks the context required to
make this decision: no indication on the AS or context around the type
being created. Refactoring this would be a heavy task as it would
require getting this information in every place using the GR for type
creation.

Instead, we do something similar to CBuffers:
 - find all globals with this address space, and change their type to
   a target-specific type.
 - insert a new intrinsic in place of every reference to this global
   variable.

This allow the backend to handle both layout variables loads and type
lowering independently.

Type lowering has nothing specific: when we encounter a target extension
type with spirv.PushConstant, we lower this to the correct SPIR-V type
with the proper offset & block decorations.

As for the intrinsic, it's mostly a no-op, but required since we have
this target-specific type.

Note: this implementation prevents the static declaration of multiple
push constants in a single shader module. The actual specification is
more relaxed: there can be only one **used** push constant block per
entrypoint. To correctly implement this, we'd require to keep some
additional state to determine the list of statically used resources per
entrypoint. This shall be addressed as a follow-up (see #170310)
2025-12-18 11:01:11 +01:00
Benjamin Maxwell
c8bf963282
[AArch64][SVE] Rework VECTOR_COMPRESS lowering (#171162)
This removes the use of `LowerVECTOR_COMPRESS` in `ReplaceNodeResults`
(which was used to promote illegal integer VTs), and instead only marks
the legal VTs as "Custom" (allowing for standard type legalization). 

This patch also simplifies the lowering by using the existing
fixed-length <-> SVE conversion helpers. 

This was intended to be an NFC, but it appears to have caused some minor
code-gen changes/improvements.
2025-12-18 09:34:17 +00:00
Kevin Per
98b82f90df
[PowerPC]: Add check for cast when shufflevector (#172443)
The crash happens because the cast for `Mask =
cast<ShuffleVectorSDNode>(Res)->getMask();` fails for node `t197: v16i8
= vector_shuffle<16,17,18,19,4,5,6,7,8,9,10,11,u,u,u,u> t196, t196`.
However, both `LHS` and `RHS` are the same node, so
`DAG.getCommutedVectorShuffle` doesn't return a `ShuffleVectorSDNode`
and crashes. The fix is to add a check before the cast is performed.

Closes https://github.com/llvm/llvm-project/issues/172265
2025-12-18 17:14:01 +08:00
Frederik Harwath
71760f324f
[CodeGen] Merge ExpandLargeDivRem into ExpandFp (#172680)
Both passes expand instructions at the IR level.
They use the same kind of instruction visitation
logic and contain significant code duplication e.g.
for scalarization.
2025-12-18 09:22:47 +01:00
Craig Topper
50ea2d8551
[RISCV] Extract vector from passthru when combining tuple_extract+vlseg. (#172743)
The passthru operand is a tuple. We need to extract the correct field
vector from it.

Existing tests only handled the undef passthru case which accidentally
worked. Possibly due to IMPLICIT_DEF being converted to noreg.

Fixes #172628.
2025-12-17 22:45:47 -08:00
WANG Rui
8e648380a1 [LoongArch][NFC] Add tests for issue #172154 2025-12-18 14:26:16 +08:00
Craig Topper
6d405d6b5e
[RISCV] Replace enablePExtCodeGen with hasStdExtP for scalar code in RISCVISelDAGToDAG.cpp (#172785)
The enablePExtCodeGen was only intended to block vector code while
it is still in development. This code uses scalar types so we only
need to check for the extension.
2025-12-17 22:22:05 -08:00
WANG Rui
55ff003344 [LoongArch][NFC] Partial revert "Custom lowering for vector logical right shifts of integers"
This reverts commit a108881b24ecfea8d194b33dd9fb211943065bca, except for the tests.
2025-12-18 14:15:33 +08:00
Brandon Wu
1e90a273fe
[RISCV][llvm] Support fminimum, fmaximum, fminnum, fmaxnum, fminimumnum, fmaximumnum codegen for zvfbfa (#171794)
This patch supports for both scalable vector and fixed-length vector.
It also enables fsetcc pattern match for zvfbfa to make fminimum and
fmaximum work correctly.
2025-12-18 14:12:04 +08:00
quic_hchandel
8a0cdb88f9
[RISCV] Add short forward branch support for qc.e.lb(u), qc.e.lh(u) and qc.e.lw (#172629) 2025-12-18 09:38:31 +05:30
Craig Topper
cd75676928
[RISCV] Prefer li over pli in RISCVMatInt. (#172778)
li is compressible, pli is not.
2025-12-17 19:35:38 -08:00
hev
457f93d448
[LoongArch] Fix OptimizeW crash when MI operand is not a virtual register (#172604)
Fixes #172600
2025-12-18 09:40:39 +08:00
Craig Topper
94e03a7894
[RISCV] Enable use of PACK in RISCVMatInt with P extension. (#172760) 2025-12-17 17:32:04 -08:00
Mingjie Xu
796fafeff9
[IR] Update PHINode::removeIncomingValueIf() to use the swap strategy like PHINode::removeIncomingValue() (#172639)
As suggested in https://github.com/llvm/llvm-project/pull/171963, update
`PHINode::removeIncomingValueIf()` to use the swap strategy too.
2025-12-18 09:09:50 +08:00
Kevin Per
0036c67445
[RISCV]: Implemented softening of FCANONICALIZE (#169234)
The `ISD::FCANONICALIZE` is mapped to `llvm.minnum(x, x)`.

Closes https://github.com/llvm/llvm-project/issues/169216
2025-12-17 16:38:18 -08:00
Rahman Lavaee
53005fd435
Use the Propeller CFG profile in the PGO analysis map if it is available. (#163252)
This PR implements the emitting of the post-link CFG information in PGO
analysis map, as explained in the
[RFC](https://discourse.llvm.org/t/rfc-extending-the-pgo-analysis-map-with-propeller-cfg-frequencies/88617).
This is enabled by a flag `pgo-analysis-map-emit-bb-sections-cfg`.

This PR bumps the SHT_LLVM_BB_ADDR_MAP version to 5.
Also includes some refactoring changes related to storing the CFG in the
Basic block sections profile reader.
2025-12-17 14:19:18 -08:00
Valeriy Savchenko
e7892d702f
[DAGCombiner] Fix assertion failure in vector division lowering (#172321) 2025-12-17 22:09:54 +00:00
Folkert de Vries
a587ccd87d
fix llvm.fma.f16 double rounding issue when there is no native support (#171904)
fixes https://github.com/llvm/llvm-project/issues/98389

As the issue describes, promoting `llvm.fma.f16` to `llvm.fma.f32` does
not work, because there is not enough precision to handle the repeated
rounding. `f64` does have sufficient space. So this PR explicitly
promotes the 16-bit fma to a 64-bit fma.

I could not find examples of a libcall being used for fma, but that's
something that could be looked in separately to work around code size
issues.
2025-12-17 22:03:01 +01:00
Pan Tao
b6bfa85686
[aarch64] Mix the frame pointer with the stack cookie when protecting the stack (#161114)
This strengthens the guard and matches MSVC.

Fixes #156573 .
2025-12-17 12:52:28 -08:00
Yonah Goldberg
f09f578c0d
[NVPTX][DagCombiner] Eliminate guards on shift amount because PTX shifts automatically clamp (#172431)
Transform patterns like:

`(select (ugt shift, BitWidth-1), 0, (srl/shl x, shift))`
`(select (ult shift, BitWidth), (srl/shl x, shift), 0)`

Into:

`(srl/shl x, shift)`

These patterns arise from C/C++ code like shift >= 32 ? 0 : x >> shift
which guards against undefined behavior. PTX shr/shl instructions clamp
shift amounts >= BitWidth to produce 0 for logical shifts, making the
guard redundant.
2025-12-17 12:13:36 -08:00
Matt Arsenault
399b33086f
AMDGPU: Add baseline tests for fcopysign with 0 magnitude (#172698) 2025-12-17 20:22:52 +01:00
Steven Perron
e2d21b2eb8
[SPIR-V] Legalize vector arithmetic and intrinsics for large vectors (#170668)
This patch improves the legalization of vector operations, particularly
focusing on vectors that exceed the maximum supported size (e.g., 4
elements
for shaders). This includes better handling for insert and extract
element
operations, which facilitates the legalization of loads and stores for
long vectors—a common pattern when compiling HLSL matrices with Clang.

Key changes include:
- Adding legalization rules for G_FMA, G_INSERT_VECTOR_ELT, and various
  arithmetic operations to handle splitting of large vectors.
- Updating G_CONCAT_VECTORS and G_SPLAT_VECTOR to be legal for allowed
  types.
- Implementing custom legalization for G_INSERT_VECTOR_ELT using the
  spv_insertelt intrinsic.
- Enhancing SPIRVPostLegalizer to deduce types for arithmetic
instructions
  and vector element intrinsics (spv_insertelt, spv_extractelt).
- Refactoring legalizeIntrinsic to uniformly handle vector legalization
  requirements.

The strategy for insert and extract operations mirrors that of bitcasts:
incoming intrinsics are converted to generic MIR instructions
(G_INSERT_VECTOR_ELT
and G_EXTRACT_VECTOR_ELT) to leverage standard legalization rules (like
splitting).
After legalization, they are converted back to their respective SPIR-V
intrinsics
(spv_insertelt, spv_extractelt) because later passes in the backend
expect these
intrinsics rather than the generic instructions.

This ensures that operations on large vectors (e.g., <16 x float>) are
correctly broken down into legal sub-vectors.
2025-12-17 13:00:49 -05:00
natanelh-mobileye
fa78d6a5f1
[SDAG] Shrink (abd? (?ext x) (?ext y)) (#171865)
Alive2 test: https://alive2.llvm.org/ce/z/maryYU
Lit test before change: https://godbolt.org/z/nEKWdPbMv

Fixes #171640
2025-12-17 16:30:52 +00:00