39449 Commits

Author SHA1 Message Date
Trung Nguyen
b6e7c475cb
[CodeGen] Ignore ANNOTATION_LABEL in scheduler (#190499)
This fixes a crash in `clang` for `armv7` targets when optimizations are
enabled.

Fixes #190497
2026-04-06 14:16:01 +02:00
Kartik Ohlan
7c60d08056
[DAG] computeKnownFPClass - add ISD::SPLAT_VECTOR handling (#189780)
Fixes #189481

Implement ISD::SPLAT_VECTOR in SelectionDAG::computeKnownFPClass to
correctly propagate floating-point properties from scalar operands to
vectors.

Added AArch64 and RISC-V test coverage
2026-04-04 14:54:12 +00:00
Alan Li
5e0efc0f1d
Reland "[GlobalISel][LLT] Introduce FPInfo for LLT (Enable bfloat, ppc128float and others in GlobalISel) (#155107)" (#188502)
This is a reland of https://github.com/llvm/llvm-project/pull/155107
along with a fix for old gcc builds.

This patch is reverted in
https://github.com/llvm/llvm-project/pull/188344 due to compilation
failures described in
https://github.com/llvm/llvm-project/pull/155107#issuecomment-4121292756

The fix to old gcc builds is to remove `constexpr` modifiers in the
original patch in 0721d8e7768c011b8cf2d4d223ca6eca3392b1f9
2026-04-04 05:57:13 -07:00
Craig Topper
c7824ac669
[TargetLowering] Remove stale comment. NFC (#190275)
Missed removing in #188653
2026-04-03 14:26:09 -07:00
Simon Pilgrim
6832709dc0
[DAG] SDPatternMatch - rename m_Opc -> m_SpecificOpc (#190215)
Match naming convention for other m_Specific* matchers, and frees up the
m_Opc() matcher for future use in #84940 to allow us to capture the
opcode of a unknown binop

Moving to m_SpecificOpc does mess up the formatting in a few places,
I've tried to refactor to use the m_Value(SDValue, ....) matcher where I
can to retrieve some whitespace
2026-04-03 18:03:00 +00:00
Craig Topper
5d08beaec8
[TargetLowering] Remove NeedToApplyOffset from prepareSREMEqFold. NFC (#190256)
For a given element, I believe A is only 0 when the divisor is INT_MIN.
The only way for NeedToApplyOffset to be false after processing all
elements, is for all divisors to be INT_MIN. If all divisors are
INT_MIN, then all divisors are a power of 2 and we wouldn't do the
transform.
2026-04-03 07:32:13 -07:00
Yuta Saito
fd65b3ef77
[GlobalISel] Fix UMR in SwiftErrorValueTracking (#190273)
Fix issue reported on
https://github.com/llvm/llvm-project/pull/188296#issuecomment-4179103756

`SwiftErrorValueTracking` holds per-function state used by
`IRTranslator`.

On targets where `TargetLowering::supportSwiftError()` is false, (e.g.
wasm) `SwiftErrorValueTracking::setFunction()` exits early.
Historically, that early return happened before clearing per-function
containers, and pointer members (including `SwiftErrorArg`) had no
in-class initialization.

The bad case is a function with a swifterror argument on such a target:
`IRTranslator` uses `SwiftError.getFunctionArg()` without checking
`supportSwiftError()` and this could read an uninitialized
`SwiftErrorArg` value. (SelectionDAG gates the `getFunctionArg` usages
behind `supportSwiftError()`, so it's specific to GlobalISel)

29391328ab66 added [a first test
case](llvm/test/CodeGen/WebAssembly/GlobalISel/irtranslator/args-swiftcc.ll)
that satisfies:
- the target is `supportSwiftError` = false
- use swiftcc
- use GlobalISel

and it made the issue observable with sanitizer builds. This commit
fixes the per-function container reinitialization and defensively add
explicit pointer member initializations.
2026-04-03 14:33:35 +01:00
Simon Pilgrim
5674755cb6
[DAG] visitMUL - cleanup pattern matchers to use m_Shl and (commutative) m_Mul directly (#190339)
Based on feedback on #190215
2026-04-03 13:21:51 +00:00
Simon Pilgrim
15ed4f6c49
[DAG] isKnownToBeAPowerOfTwo - add missing DemandedElts handling to ISD::TRUNCATE and hidden m_Neg pattern (#190190)
Use MaskedVectorIsZero to match X & -X pattern when only DemandedElts
match the negation pattern

Fixes #181654 (properly)
2026-04-03 12:03:33 +00:00
Ryotaro Kasuga
9e516f5c58
[MachinePipeliner] Remove isLoopCarriedDep and use DDG (#174394)
This patch completely removes `isLoopCarriedDep`, which was used
previously to identify loop-carried dependencies in the DAG. Now that we
have the DDG representation, this special handling is no longer
necessary. Simply replacing its usage with the DDG causes several tests
to fail, since cycle detection takes some of the validation-only edges
in the DDG into account. To address this, this patch introduces extra
edges in the DDG, which are used only for cycle detection and not for
other parts of the pass (e.g., scheduling). The extra edges are
determined to preserve the existing behavior of the pass as closely as
possible, which makes the predicates for adding them somewhat complex.

Split off from #135148, and the final patch in the series for #135148
2026-04-03 10:36:34 +00:00
Craig Topper
e2e5db8401
[TargetLowering] Speculative fix for a non-determinism issue between different compilers. (#190219)
The evaluation order of function arguments is unspecified by the C++
standard. We had two getNode calls as function arguments which causes
the nodes to be created in a different order depending on the compiler
used. This patch moves them to their own variables to ensure they are
called in the same order on all compilers.

Possible fix for #190148.
2026-04-02 12:12:28 -07:00
Craig Topper
24146ce5cf
[TargetLowering] Remove INT_MIN special case from prepareSREMEqFold. (#188653)
If the divisor is INT_MIN, we can still treat it like any other power of
2. We'll fold i32 (seteq (srem X, INT_MIN)) to
(setule (rotr (add (mul X, 1), INT_MIN), 31), 1). Alive2 says this is
correct https://alive2.llvm.org/ce/z/vjzqKk.

The multiply is a NOP, the add toggles the sign bits. The rotate puts
the lowest 31 bits of into the upper 31 bits. The sign bit is now in the
LSB. The compare checks if the upper 31 bits are 0.

srem X, INT_MIN has a remainder of 0 if X is 0 or INT_MIN which is
equivalent to checking if the uppper 31 bits are 0 after the rotate.

I don't think we need to add any constant for power of 2 but toggling
the sign bit like we do now doesn't hurt.
2026-04-02 09:45:47 -07:00
zGoldthorpe
e9a62c7698
[DAG] computeKnownFPClass: handle ISD::FABS (#190069)
Use `KnownFPClass::fabs` to handle `ISD::FABS`.

This case will help with updating #188356 to use `computeKnownFPClass`.
2026-04-02 14:48:54 +00:00
dibrinsofor
eaa3ef9ddc
[DAG] Propagate OrZero and DemandedElts for min/max in isKnownToBeAPowerOfTwo (#182369)
Fixes #181643 

For queries like `isKnownToBeAPowerOfTwo(V, OrZero=true)`, if an operand
is known to be "pow2-or-zero" but not strictly non-zero power-of-two,
the min/max case currently returns false even when the result remains
pow2-or-zero.

For instance:
- `A = select cond, 4, 0`  (A is pow2-or-zero)
- `R = umin(A, 16)`

`R` is always in `{0, 4}` and querying `isKnownToBeAPowerOfTwo(R,
OrZero=true)` should be true.

Added unitests for baseline and failing case and now propagating
correctly to `OrZero` and `DemandedElts`
2026-04-02 12:50:11 +01:00
Nerixyz
91b90652bb
Reland "[CodeView] Generate S_DEFRANGE_REGISTER_REL_INDIR" (#189401)
Initially added in #187709. It was reverted in #188833, because
[llvm-clang-x86_64-sie-win](https://lab.llvm.org/buildbot/#/builders/46/builds/32873)
was failing in
`cross-project-tests/debuginfo-tests/dexter-tests/nrvo.cpp`.

The test passed for me locally. After checking on another machine, I
found that `S_DEFRANGE_REGISTER_REL_INDIR` is only supported by
dbgeng/WinDbg from Windows 10.0 Build 19041 (released 2020) onwards.
SDKs before this will fail to read the value. That buildbot is on
Windows 10.0 Build 17763.

I'm not sure if we should make the generation of that record
conditional. Debuggers that can't read the record will skip it. They'll
still see that there's some local variable, but won't be able to display
the value.

As far as I know, users of older Windows 10 builds should be able to
install a newer Windows SDK and use the WinDbg from that version. But I
haven't tested that.
2026-04-02 12:15:11 +02:00
Gabriel Baraldi
5e0a06b34d
Move ExpandMemCmp and MergeIcmp to the middle end (#77370)
Moving these into the middle-end pipeline will allow for additional
optimization of the expansion result, such as CSE of redundant loads
(c.f. https://godbolt.org/z/bEna4Md9r). For now, we conservatively place
the passes at the end of the middle-end pipeline, so we mostly don't
benefit from additional optimizations yet. The pipeline position will be
moved in a future change.

This builds on work done by legrosbuffle in
https://reviews.llvm.org/D60318.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 09:57:00 +02:00
David Green
083f9c158a
[AArch64][GISel] Widen non-power2 element sizes for ctlz. (#189371)
This addresses an illegal mutation kind, where gisel would hit an
assert. It expands vector elements for non-power2 elements or elements
less that i8 to a power of 2.

A fix to handle vector types correctly was needed in LegalizerHandler.

Fixes #185411
2026-04-02 07:27:12 +01:00
zGoldthorpe
9a354fc5a1
[SelectionDAG] Use KnownBits to determine if an operand may be NaN. (#188606)
Given a bitcast into a fp type, use the known bits of the operand to
infer whether the resulting value can never be NaN.
2026-04-01 22:47:01 -06:00
zGoldthorpe
d6d0876d1a
[NFC][SelectionDAG] Refactor out common default DemandedElts calculation (#190031)
Deduplicating the repeated pattern
```cpp
APInt DemandedElts = VT.isFixedLengthVector()
                         ? APInt::getAllOnes(VT.getVectorNumElements())
                         : APInt(1, 1);
```
in SelectionDAG.
2026-04-01 14:40:48 -06:00
zGoldthorpe
24b6ee90c1
[SelectionDAG] Assert on non-FP operand to computeKnownFPClass (#189752)
Assert correct usage of `computeKnownFPClass` or users (i.e.,
`isKnownNeverNaN`).
2026-04-01 17:41:33 +00:00
zGoldthorpe
d7e129dffb
[SelectionDAGBuilder] Only check VPCmp for NaNs in fp comparisons (#189749)
`getFCmpCodeWithoutNaN` should only be used for FP comparisons (which is
also the only context in which `isKnownNeverNaN` makes sense).
2026-04-01 17:00:55 +00:00
LU-JOHN
c245d764b8
[CodeGen] Do not remove IMPLICIT_DEF unless all uses have undef flag added (#188133)
Do not remove IMPLICIT_DEF of a physreg unless all uses have an undef
flag added. Previously, only the first use instruction had undef flags
added. This will cause a failure in machine instruction verification.
Multi-instruction uses tested in AMDGPU/multi-use-implicit-def.mir and
X86/multi-use-implicit-def.mir.

---------

Signed-off-by: John Lu <John.Lu@amd.com>
2026-04-01 10:11:42 -05:00
Lucas Ramirez
54914a4287
[CodeGen] Allow rematerializer to rematerialize at the end of a block (#184339)
This makes the rematerializer able to rematerialize MIs at the end of a
basic block. We achieve this by tracking the parent basic block of every
region inside the rematerializer and adding an explicit target region to
some of the class's methods. The latter removes the requirement that we
track the MI of every region (`Rematerializer::MIRegion`) after the
analysis phase; the class member is therefore deleted.

This new ability will be used shortly to improve the design of the
rollback mechanism.
2026-04-01 16:58:44 +02:00
Pankaj Dwivedi
86c3abe85e
[NFC] Rename InstructionUniformity to ValueUniformity (#189935) 2026-04-01 19:28:33 +05:30
DaKnig
d6b8163f3f
Retry "[SDAG] (abs (add nsw a, -b)) -> (abds a, b) (#175801)" (#186659)
A better version of #175801 . see that for more info.

Fixes #185467

The original patch was checking the correctness of the transformation
based on the original Op1 , which was then negated (in the case of
IsAdd). This patch fixes that issue by inverting the sign bit in that
case.

Also pushed a slight nfc there to simplify the code and remove some
duplication.

alive2 proofs:

abds: https://alive2.llvm.org/ce/z/oJQPss

abdu: https://alive2.llvm.org/ce/z/HfPF5q

Note that the regression test is not (wrongly) affected anymore by the
patch (as it did before)
2026-04-01 13:37:29 +00:00
Gergo Stomfai
15d48c5bbe
[X86][DAG] remove LowerFCanonicalize (#188127)
Remove LowerFCanonicalize. Added fallback for cases when the scalar type also has its Custom lowering to avoid regressions on AMDGPU and SystemZ.

Fixes #143862
2026-04-01 13:34:05 +00:00
Simon Pilgrim
9a33125e42
[DAG] Add basic ISD::IS_FPCLASS constant/identity folds (#189944)
Attempts to match middle-end implementation in InstructionSimplify/foldIntrinsicIsFPClass

Fixes #189919
2026-04-01 13:06:27 +00:00
Lucas Ramirez
8a06085c61
[CodeGen] Add listener support to the rematerializer (NFC) (#184338)
This change adds support for adding listeners to the target-independent
rematerializer; listeners can catch certain rematerialization-related
events to implement some additional functionality on top of what the
rematerializer already performs.

This is NFC and has no user at the moment, but the plan is to have
listeners start being responsible for secondary/optional functionalities
that are at the moment integrated with the rematerializer itself. Two
examples of that are:

1. rollback support (currently optional), and
2. region tracking (currently mandatory, but not fundamentally necessary
to the rematerializer).
2026-04-01 13:35:37 +02:00
Luke Lau
effcd181e5
[RISCV] Remove codegen for VP float rounding intrinsics (#189896)
Part of the work to remove trivial VP intrinsics from the RISC-V
backend, see
https://discourse.llvm.org/t/rfc-remove-codegen-support-for-trivial-vp-intrinsics-in-the-risc-v-backend/87999

This splits off seven intrinsics from #179622.

We now generate vfcvt.rtz for llvm.vp.roundtozero. It looks like we
should have been using the codegen for llvm.trunc for it, but we somehow
missed that.
2026-04-01 11:04:53 +00:00
Luke Lau
1d549d9a77
[RISCV] Remove codegen for vp_lrint, vp_llrint (#189714)
Part of the work to remove trivial VP intrinsics from the RISC-V
backend, see
https://discourse.llvm.org/t/rfc-remove-codegen-support-for-trivial-vp-intrinsics-in-the-risc-v-backend/87999

This splits off two intrinsics from #179622.

We need to use the other intrinsic constructor in
ExpandVectorPredication.cpp because llrint has multiple overloaded types
2026-04-01 06:46:38 +00:00
Henry Jiang
bf50489eeb
[Psuedoprobe][MachO] Enable pseudo probes emission for MachO (#185758)
Enable pseudo probes emission for MachO. Due to the 16 character limit
of MachO segment and section, the file sections will be
`__PSEUDO_PROBE,__probes` and `__PSEUDO_PROBE,__probe_descs`.
2026-03-31 16:27:58 -07:00
Craig Topper
b7dc4ff0ab
[TargetLowering] Replace always true if with an assert. NFC (#189750)
We already returned for UADDSAT/USUBSAT leaving SADDSAT/SSUBSAT as the
only opcodes that can get here.
2026-03-31 15:21:04 -07:00
Yonah Goldberg
bf76fa7582
[AtomicExpandPass][NFC] Refactor processAtomicInstr to be more readable (#186547)
While working on
https://discourse.llvm.org/t/rfc-add-elementwise-modifier-to-atomicrmw/90134/5
I found this `processAtomicInstr` to be a little hard to read, with
casing on the instruction type all over the place. I think it reads
nicer to just case on the instruction type once.
2026-03-31 12:22:03 -07:00
Laxman Sole
da173bfbf5
[NVPTX] Do not emit .debug_pubnames and .debug_pubtypes for NVPTX backend (#187328)
This change adds a mechanism to stop emitting `.debug_pubname`,
`.debug_pubtypes` sections for a particular target.

This is particularly useful for cases where IR is generated by frontends
that do not explicitly disable these sections (as `Clang` does for
`NVPTX`), but still use `llc` for code generation.

Currently, only `NVPTX` uses this to disable these sections.
2026-03-31 12:13:39 -07:00
Medha Tiwari
9c64cb6dca
Fix emulated TLS alignment for large variables (#171037)
Fix emulated TLS alignment for larger variables (>= 32 bytes) to use
preferred alignment.

Fixes #167219
2026-03-31 09:37:06 -07:00
Luke Lau
e891812cac
[RISCV] Remove codegen for vp_minimum, vp_maximum (#189550)
Part of the work to remove trivial VP intrinsics from the RISC-V
backend, see
https://discourse.llvm.org/t/rfc-remove-codegen-support-for-trivial-vp-intrinsics-in-the-risc-v-backend/87999

This splits off two intrinsics from #179622.
2026-03-31 15:12:18 +00:00
natanelh-mobileye
46dd9d6f52
[SDAG][abd] Combine abd of small types (#181538)
It is beneficial to combine abd of illegal, small types (types that get promoted to wider scalar size).
2026-03-31 13:40:51 +00:00
Jay Foad
fbfb83978c
[MachineVerifier] Disallow subregister defs in SSA form (#189403) 2026-03-31 09:50:08 +01:00
Luke Lau
598f3535fa
[SelectionDAG] Expand CTTZ_ELTS[_ZERO_POISON] and handle legalization (#188691)
This is a second attempt at "[SelectionDAG] Expand
CTTZ_ELTS[_ZERO_POISON] and handle splitting" (#188220)

That PR had to be reverted in 7d39664a6ae8daaf186b65578492244d96a50bf2
because we had crashes on AMDGPU since we didn't have scalarization
support, and other crashes on PowerPC because we didn't handle the case
when a vector needed widened. Tests for these are added in
AMDGPU/cttz-elts.ll, RISCV/rvv/cttz-elts-scalarize.ll and
PowerPC/cttz-elts.ll.

The former crash has been fixed by adding
DAGTypeLegalizer::ScalarizeVecOp_CTTZ_ELTS.

The second crash has been fixed by reworking
TargetLowering::expandCttzElts. The expansion for CTTZ_ELTS is nearly
identical to VECTOR_FIND_LAST_ACTIVE, except it uses a reverse step
vector and subtracts the result from VF. The easiest way to fix these
crashes without introducing regressions is to reuse the
VECTOR_FIND_LAST_ACTIVE expansion which already handles the case where
the vector needs widened.

This means that the node now needs to take in a boolean vector argument
and uses VSELECT instead of an AND to zero out inactive lanes, so the op
promotion code has also been shared.
2026-03-31 07:25:57 +00:00
Mingjie Xu
227edfb2f4
[CodeGenPrepare][NFC] Reland: Update the dominator tree instead of rebuilding it (#179040)
The original differential revision is https://reviews.llvm.org/D153638

Reverted in
f5b5a30858
because of causing a clang crash.

This patch relands it with the crash fixed. Call `DTU->flush()` in each
iteration of `while (MadeChange)`
loop, flush all awaiting BasicBlocks deletion, and prevent iterator
invalidation.
2026-03-31 09:01:11 +08:00
Demetrius Kanios
96bd7b6e15
[CodeGen] Add additional params to TargetLoweringBase::getTruncStoreAction (#187422)
The truncating store analogue of #181104.

Adds `Alignment` and `AddrSpace` parameters to
`TargetLoweringBase::getTruncStoreAction` and dependents, and introduces
a `getCustomTruncStoreAction` hook for targets to customize legalization
behavior using this new information.

This change is fully backwards compatible from the target's point of
view, with `setTruncStoreAction` having identical functionality. The
change is purely additive.
2026-03-30 16:52:45 -07:00
Simon Pilgrim
d74f098a30
[DAG] isKnownNeverNaN - fallback to computeKnownFPClass check (#189476)
Remove ConstantFPSDNode handling from isKnownNeverNaN and fallback to
using computeKnownFPClass if there are no opcode matches in
isKnownNeverNaN

The test check changes are due to isKnownNeverNaN not handling
UNDEF/POISON but computeKnownFPClass does (POISON in particular now
returns isKnownNeverNaN == true, preventing a ISD::FCANONICALIZE call in
expandFMINNUM_FMAXNUM).
2026-03-30 21:49:15 +00:00
Bill Wendling
9d3079a7a9
[NFC][CodeGen] Prepare for expansion of InlineAsmPrepare (#189469)
Move some functions around so that the CallBrInst processing is
contained. The 'static' functions don't need to be declared at the top;
just place them before the calls. Fix the naming to use lower-case for
the first letter of function names.
2026-03-30 20:54:00 +00:00
Alexey Merzlyakov
06725d7ef5
[GISel] Keep non-negative info in SUB(CTLZ) (#189314)
Implement non-negative value tracking for SUB-CTLZ chains in GlobalISel,
matching the behavior previously added to SelectionDAG.

Additionally, refactor the SelectionDAG implementation from the previous
patch to improve performance and code density.

Related to https://github.com/llvm/llvm-project/issues/136516 and
https://github.com/llvm/llvm-project/pull/186338#discussion_r2980420174
2026-03-30 22:10:47 +02:00
Aiden Grossman
9331b5bb77 [DAG] Fix -Wunused-variable
A recently introduced local is only used in an assertion which means we
get -Wunused-variable in release+noasserts builds. Mark it
[[maybe_unused]] rather than inlinine the definition given there are
multiple uses within the assert.
2026-03-30 17:51:42 +00:00
Alexis Engelke
bbef10d9f1
[CodeGen][NFC] Compute MaximumLegalStoreInBits just once (#189355)
Instead of iterating over all value types per basic block, pre-compute
the TLI-specific value once when constructing the TLI.
2026-03-30 18:44:18 +02:00
Anshul Nigham
7feb816ed0
[NFC] Removes unused Combiner dependency on TargetPassConfig (#188365)
This enables NewPM ports since it removes multiple pass dependencies on
`TargetPassConfig` which we don't want to port to the NewPM.

It looks like no derived classes of Combiner actually use this pointer,
and it is also unused in the Combiner class.
2026-03-30 08:58:22 -07:00
Xinlong Chen
aa22fca59a
[DAG] Add initial version of SelectionDAG::computeKnownFPClass (#188790)
This patch adds an initial skeleton for `SelectionDAG::computeKnownFPClass`.

The initial version includes:
- DemandedElts wrapper and max depth early-out
- `ConstantFPSDNode` and `BUILD_VECTOR` handling
- `TargetLowering::computeKnownFPClassForTargetNode` virtual hook for backend extensions

Initial test coverage for constant scalars, BUILD_VECTOR, and max depth
early-out is added in `AArch64SelectionDAGTest.cpp`.

closes #175571
2026-03-30 14:08:44 +00:00
Simon Pilgrim
7382a993b4
[DAG] SimplifyDemandedBits - limit BITCAST -> FGETSIGN fold to custom/legal scalar SimplifyDemandedBits cases (#189363)
All of the non-i32 zero_extend codepath is unaffected by this

Pulled out of the discussion on #189129
2026-03-30 14:02:05 +00:00
Jim Lin
2b41985405
[DAG] Fix incorrect ForSigned handling in computeConstantRange calls (#188889)
Fix two places where ForSigned was incorrectly passed to
computeConstantRange, causing wrong signed/unsigned range computation.

In computeConstantRangeIncludingKnownBits (DemandedElts overload),
the call omitted ForSigned, so Depth (unsigned) was implicitly
converted to bool for the ForSigned parameter. Introduced in
a6a66a4e6915.

In visitIMINMAX, the call always passed ForSigned=false, even when
folding SMAX/SMIN which query signed bounds from the resulting range.
2026-03-30 10:30:19 +00:00