38740 Commits

Author SHA1 Message Date
Benjamin Maxwell
135ddf1e8e
[AArch64][SVE] Add basic support for @llvm.masked.compressstore (#168350)
This patch adds SVE support for the `masked.compressstore` intrinsic via
the existing `VECTOR_COMPRESS` lowering and compressing the store mask
via `VECREDUCE_ADD`.

Currently, only `nxv4[i32|f32]` and `nxv2[i64|f64]` are directly
supported, with other types promoted to these, where possible.

This is done in preparation for LV support of this intrinsic, which is
currently being worked on in #140723.
2025-11-28 10:17:36 +00:00
Alex Bradbury
fd19a20a1a
Revert "[ShrinkWrap] Modify shrink wrapping to accommodate functions terminated by no-return blocks" (#169852)
Reverts llvm/llvm-project#167548

As commented at
https://github.com/llvm/llvm-project/pull/167548#issuecomment-3587008602
this is causing miscompiles in two-stage RISC-V Clang/LLVM builds that
result in test failures on the builders.
2025-11-27 19:12:56 +00:00
Nathan Corbyn
650eeb867f
[ShrinkWrap] Modify shrink wrapping to accommodate functions terminated by no-return blocks (#167548)
At present, the shrink wrapping pass misses opportunities to shrink wrap
in the presence of machine basic blocks which exit the function without
returning. Such cases arise from C++ functions like the following:
```cxx
int foo(int err, void* ptr) {
    if (err == -1) {
         if (ptr == nullptr) {
             throw MyException("Received `nullptr`!", __FILE__, __LINE__);
         }
         
         handle(ptr);
    }
    
    return STATUS_OK;
}
```
In particular, assuming `MyException`'s constructor is not marked
`noexcept`, the above code will generate a trivial EH landing pad
calling `__cxa_free_exception()` and rethrowing the unhandled internal
exception, exiting the function without returning. As such, the shrink
wrapping pass refuses to touch the above function, spilling to the stack
on every call, even though no CSRs are clobbered on the hot path. This
patch tweaks the shrink wrapping logic to enable the pass to fire in
this and similar cases.
2025-11-27 10:17:25 +00:00
Peter Collingbourne
6227eb90da
Add IR and codegen support for deactivation symbols.
Deactivation symbols are a mechanism for allowing object files to disable
specific instructions in other object files at link time. The initial use
case is for pointer field protection.

For more information, see the RFC:
https://discourse.llvm.org/t/rfc-deactivation-symbols/85556

Reviewers: ojhunt, nikic, fmayer, arsenm, ahmedbougacha

Reviewed By: fmayer

Pull Request: https://github.com/llvm/llvm-project/pull/133536
2025-11-26 12:37:09 -08:00
Matt Arsenault
9b88cd9945
CodeGen: Remove PointerLikeRegClass handling from codegen (#159883)
All uses have been migrated to RegClassByHwMode. This is now
an implementation detail of InstrInfoEmitter for pseudoinstructions.
2025-11-26 10:14:37 -05:00
daniilavdeev
cc1c41724d
[dwarf] make dwarf fission compatible with RISCV relaxations 2/2 (#164813)
This patch makes DWARF fission compatible with RISC-V relaxations by
using indirect addressing for the DW_AT_high_pc attribute. This
eliminates the remaining relocations in .dwo files.
2025-11-26 15:36:02 +03:00
daniilavdeev
5f777b2c8f
[dwarf] make dwarf fission compatible with RISCV relaxations 1/2 (#166597)
Currently, -gsplit-dwarf and -mrelax are incompatible options in Clang.
The issue is that .dwo files should not contain any relocations, as they
are not processed by the linker. However, relaxable code emits
relocations in DWARF for debug ranges that reside in the .dwo file when
DWARF fission is enabled.

This patch makes DWARF fission compatible with RISC-V relaxations. It
uses the StartxEndx DWARF forms in .debug_rnglists.dwo, which allow
referencing addresses from .debug_addr instead of using absolute
addresses. This approach eliminates relocations from .dwo files.
2025-11-26 00:52:22 +03:00
Matt Arsenault
1c5b1501ca
CodeGen: Move libcall lowering configuration to subtarget (#168621)
Previously libcall lowering decisions were made directly
in the TargetLowering constructor. Pull these into the subtarget
to facilitate turning LibcallLoweringInfo into a separate analysis
in the future.
2025-11-25 11:59:56 -05:00
Drew Kersnar
17852deda7
[NVPTX] Lower LLVM masked vector loads and stores to PTX (#159387)
This backend support will allow the LoadStoreVectorizer, in certain
cases, to fill in gaps when creating load/store vectors and generate
LLVM masked load/stores
(https://llvm.org/docs/LangRef.html#llvm-masked-store-intrinsics). To
accomplish this, changes are separated into two parts. This first part
has the backend lowering and TTI changes, and a follow up PR will have
the LSV generate these intrinsics:
https://github.com/llvm/llvm-project/pull/159388.

In this backend change, Masked Loads get lowered to PTX with `#pragma
"used_bytes_mask" [mask];`
(https://docs.nvidia.com/cuda/parallel-thread-execution/#pragma-strings-used-bytes-mask).
And Masked Stores get lowered to PTX using the new sink symbol syntax
(https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-st).

# TTI Changes
TTI changes are needed because NVPTX only supports masked loads/stores
with _constant_ masks. `ScalarizeMaskedMemIntrin.cpp` is adjusted to
check that the mask is constant and pass that result into the TTI check.
Behavior shouldn't change for non-NVPTX targets, which do not care
whether the mask is variable or constant when determining legality, but
all TTI files that implement these API need to be updated.

# Masked store lowering implementation details
If the masked stores make it to the NVPTX backend without being
scalarized, they are handled by the following:
* `NVPTXISelLowering.cpp` - Sets up a custom operation action and
handles it in lowerMSTORE. Similar handling to normal store vectors,
except we read the mask and place a sentinel register `$noreg` in each
position where the mask reads as false.

For example, 
```
t10: v8i1 = BUILD_VECTOR Constant:i1<-1>, Constant:i1<0>, Constant:i1<0>, Constant:i1<-1>, Constant:i1<-1>, Constant:i1<0>, Constant:i1<0>, Constant:i1<-1>
t11: ch = masked_store<(store unknown-size into %ir.lsr.iv28, align 32, addrspace 1)> t5:1, t5, t7, undef:i64, t10

->

STV_i32_v8 killed %13:int32regs, $noreg, $noreg, killed %16:int32regs, killed %17:int32regs, $noreg, $noreg, killed %20:int32regs, 0, 0, 1, 8, 0, 32, %4:int64regs, 0, debug-location !18 :: (store unknown-size into %ir.lsr.iv28, align 32, addrspace 1);

```

* `NVPTXInstInfo.td` - changes the definition of store vectors to allow
for a mix of sink symbols and registers.
* `NVPXInstPrinter.h/.cpp` - Handles the `$noreg` case by printing "_".

# Masked load lowering implementation details
Masked loads are routed to normal PTX loads, with one difference: a
`#pragma "used_bytes_mask"` is emitted before the load instruction
(https://docs.nvidia.com/cuda/parallel-thread-execution/#pragma-strings-used-bytes-mask).
To accomplish this, a new operand is added to every NVPTXISD Load type
representing this mask.
* `NVPTXISelLowering.h/.cpp` - Masked loads are converted into normal
NVPTXISD loads with a mask operand in two ways. 1) In type legalization
through replaceLoadVector, which is the normal path, and 2) through
LowerMLOAD, to handle the legal vector types
(v2f16/v2bf16/v2i16/v4i8/v2f32) that will not be type legalized. Both
share the same convertMLOADToLoadWithUsedBytesMask helper. Both default
this operand to UINT32_MAX, representing all bytes on. For the latter,
we need a new `NVPTXISD::MLoadV1` type to represent that edge case
because we cannot put the used bytes mask operand on a generic
LoadSDNode.
* `NVPTXISelDAGToDAG.cpp` - Extract used bytes mask from loads, add them
to created machine instructions.
* `NVPTXInstPrinter.h/.cpp` - Print the pragma when the used bytes mask
isn't all ones.
* `NVPTXForwardParams.cpp`, `NVPTXReplaceImageHandles.cpp` - Update
manual indexing of load operands to account for new operand.
* `NVPTXInsrtInfo.td`, `NVPTXIntrinsics.td` - Add the used bytes mask to
the MI definitions.
* `NVPTXTagInvariantLoads.cpp` - Ensure that masked loads also get
tagged as invariant.

Some generic changes that are needed:
* `LegalizeVectorTypes.cpp` - Ensure flags are preserved when splitting
masked loads.
* `SelectionDAGBuilder.cpp` - Preserve `MD_invariant_load` on masked
load SDNode creation
2025-11-25 10:26:15 -06:00
Sander de Smalen
e1b08731e5 Revert "Reland "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG""
This reverts commit bb78728826ff57f3df859e79bfd857b5a175bb6d.
2025-11-25 11:01:27 +00:00
Sander de Smalen
bb78728826
Reland "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG"
A SUBREG_TO_REG instruction expresses that the top bits of the result
register are set to a certain value (e.g. 0).

The example below expresses that the result of %1 will have the top 32
bits zeroed and the lower 32bits being equal to the result of INSTR.
```
    %0:gpr32 = INSTR
    %1:gpr64 = SUBREG_TO_REG 0, %0, sub32
```
When the RegisterCoalescer tries to remove SUBREG_TO_REG instructions by
coalescing %0 into %1, it must keep the same semantics. Currently
however, the RegisterCoalescer would emit:
```
    %1.sub32:gpr64 = INSTR
```
which no longer expresses that the top 32-bits of the register are
defined (zeroed) by INSTR.

This may cause issues with e.g. machine copy propagation where the pass
may think it can remove a COPY-like instruction because the MIR says
only the bottom 32-bits are defined/used, even though other uses of the
register rely on the top 32-bits being zeroed by the COPY-like
instruction.

This PR changes the RegisterCoalescer to instead emit:
```
    undef %1.sub32:gpr64 = MOVimm32 42, implicit-def %1
```
to express that the entire contents of %1:gpr64 are defined by the
instruction.

This tries to reland #134408 which had to be reverted due to a few reported
failures.
2025-11-24 15:55:19 +00:00
hstk30-hw
a6cec3f3e5
Reland "[RegAlloc] Fix the terminal rule check for interfere with DstReg (#168661)" (#169219)
Reland d5f3ab8ec97786476a077b0c8e35c7c337dfddf2, fix testcases.
2025-11-24 09:27:25 +08:00
Aiden Grossman
d5f3ab8ec9 Revert "[RegAlloc] Fix the terminal rule check for interfere with DstReg (#168661)"
This reverts commit 0859ac5866a0228f5607dd329f83f4a9622dedcc.

This caused a couple test failures, likely due to a mid-air collision.
Reverting for now to get the tree back to green and allow the original
author to run UTC/friends and verify the output.
2025-11-23 05:17:45 +00:00
hstk30-hw
0859ac5866
[RegAlloc] Fix the terminal rule check for interfere with DstReg (#168661)
This maybe a bug which is introduced by commit
6749ae36b4a33769e7a77cf812d7cd0a908ae3b9, and has been present ever
since.
In this case, `OtherReg` always overlaps with `DstReg` cause they from
the `Copy` all.
2025-11-23 10:11:24 +08:00
Kazu Hirata
b296386d2c
[llvm] Use llvm::equal (NFC) (#169173)
While I am at it, this patch uses const l-value references for
std::shared_ptr.  We don't need to increment the reference count by
passing std::shared_ptr by value.

Identified with llvm-use-ranges.
2025-11-22 15:30:43 -08:00
Aiden Grossman
ad7a5d4e05 [CallBrPrepare] Prefer Function &F over Function &Fn
Function &F is the more standard abbreviation (~4000 uses in llvm versus
~300 uses).
2025-11-22 07:37:20 +00:00
Hongyu Chen
3fec26e329
[DAGCombiner] Don't optimize insert_vector_elt into shuffle if implicit truncation exists (#169022)
Fixes #169017
2025-11-22 03:33:53 +08:00
Matt Arsenault
1d73b68463
TargetLowering: Avoid hardcoding OpenBSD + __guard_local name (#167744)
Query RuntimeLibcalls for the support and the name. The check
that the implementation is exactly __guard_local instead of
unsupported feels a bit strange.
2025-11-20 20:44:25 -05:00
Craig Topper
01e5e4fd00
[DAGCombiner] Remove unneeded m_BitReverse from visitBITREVERSE. NFC (#168918)
We already know we're looking at BITREVERSE, we can match on the source
operand.
2025-11-20 18:20:47 +00:00
Matt Arsenault
0e1cb2de90
Reapply "DAG: Allow select ptr combine for non-0 address spaces" (#168292) (#168786)
This reverts commit 6d5f87fc4284c4c22512778afaf7f2ba9326ba7b.

Previously this failed due to treating the unknown MachineMemOperand
value as known uniform.
2025-11-20 12:13:46 -05:00
Ramkumar Ramachandra
602fa0c7ce
[SDAG] Fix whitespace errors (NFC) (#168897)
To make life easier for future contributors. Note that formatting
changes are due to git clang-format on the touched whitespace-error
lines.
2025-11-20 16:44:31 +00:00
Jeremy Morse
2cf550a040
[DebugInfo] Force early line-zero calls to have meaningful locations (#156850)
In functions that have been seriously deformed during optimisation,
there can be call instructions with line-zero immediately after frame
setup (see C reproducer in the test added). Our previous algorithms for
prologue_end ignored these, meaning someone entering a function at
prologue_end would break-in after a function call had completed. Prefer
instead to place prologue_end and the function scope-line on the line
zero call: this isn't false (it's the first meaningful instruction of the
function) and is approximately true. Given a less than ideal function,
this is an OK solution.
2025-11-20 10:20:47 +00:00
Craig Topper
8608344778
[CFIInserter] Turn a reachable llvm_unreachable into a report_fatal_error. (#168777)
This prevents it from being optimized out in non-asserts builds.

Update X86 test to remove REQUIRES: asserts and check for LLVM ERROR.
Add FileCheck to RISC-V test and remove UNSUPPORTED.

This is the more complete fix for #168772 and #168525.
2025-11-19 22:32:26 -08:00
Matt Arsenault
db20a7f2bc
DAG: Fix constructing a temporary TargetTransformInfo instance (#168480) 2025-11-20 01:19:23 -05:00
Carl Ritson
b1c4b55118
RenameIndependentSubregs: try to only implicit def used subregs (#167486)
Attempt to only define used subregisters when creating IMPLICIT_DEF fix
ups for live interval subranges. This avoids the appearance at the MIR
level of entire (wide) registers becoming live rather than relying only
on transient LiveIntervals dead definitions for unused subregisters.
2025-11-20 09:28:34 +09:00
Matt Arsenault
253ed52436
DAG: Use poison for some vector result widening (#168290) 2025-11-19 16:49:43 -05:00
Matt Arsenault
a757c4e74e
CodeGen: Add subtarget to TargetLoweringBase constructor (#168620)
Currently LibcallLoweringInfo is defined inside of TargetLowering,
which is owned by the subtarget. Pass in the subtarget so we can
construct LibcallLoweringInfo with the subtarget. This is a temporary
step that should be revertable in the future, after LibcallLoweringInfo
is moved out of TargetLowering.
2025-11-19 19:18:13 +00:00
Matt Arsenault
0b921f52cc
DAG: Use poison when splitting vector_shuffle results (#168176) 2025-11-19 12:27:08 -05:00
Ryan Cowan
58e6d02aa2
[AArch64][GlobalISel] Check unmergeSrc is a vector in matchCombineBuildUnmerge (#168692)
This aims to fix the crash in #168495, my combine rule was
missing a check that the source vector was in fact a vector. This then
caused the legality check to fail in this example as the concat was
trying to concat a non vector.

I have also gated the bitcast of the concat to only work on non-scalable
vectors as the mutation calls `getNumElements` which crashes when called
on a scalable vector.

Fixes #168495
2025-11-19 12:30:51 +00:00
陈子昂
e38529ddbb
[DAG] Update canCreateUndefOrPoison to handle ISD::VECTOR_COMPRESS (#168010)
Fixes #167710
2025-11-19 10:21:05 +00:00
Tom Tromey
1262acf4ec
Introduce DwarfUnit::addBlock helper method (#168446)
This patch is just a small cleanup that unifies the various spots that
add a DWARF expression to the output.
2025-11-18 22:59:36 +00:00
Craig Topper
1157a22134
[GISel] Use getScalarSizeInBits in LegalizerHelper::lowerBitCount (#168584)
For vectors, CTLZ, CTTZ, CTPOP all operate on individual elements. The
lowering should be based on the element width.

I noticed this by inspection. No tests in tree are currently affected,
but I thought it would be good to fix so someone doesn't have to debug
it in the future.
2025-11-18 12:26:47 -08:00
Craig Topper
96e58b83a3
[RISCV] Legalize misaligned unmasked vp.load/vp.store to vle8/vse8. (#167745)
If vector-unaligned-mem support is not enabled, we should not generate
loads/stores that are not aligned to their element size.

We already do this for non-VP vector loads/stores.

This code has been in our downstream for about a year and a half after
finding the vectorizer generating misaligned loads/stores. I don't think
that is unique to our downstream.

Doing this for masked vp.load/store requires widening the mask as well
which is harder to do.

NOTE: Because we have to scale the VL, this will introduce additional
vsetvli and the VL optimizer will not be effective at optimizing any
arithmetic that is consumed by the store.
2025-11-18 11:13:54 -08:00
Hongyu Chen
523bd2df6d
[GISel][RISCV] Compute CTPOP of small odd-sized integer correctly (#168559)
Fixes the assertion in #168523
This patch lifts the small, odd-sized integer to 8 bits, ensuring that
the following lowering code behaves correctly.
2025-11-18 18:49:13 +00:00
Nathan Corbyn
93a8ca8fc7
[AArch64][GISel] Don't crash in known-bits when copying from vectors to non-vectors (#168081)
Updates the demanded elements before recursing through copies in case
the type of the source register changes from a non-vector register to a
vector register.

Fixes #167842.
2025-11-18 16:42:58 +00:00
Hassnaa Hamdi
3d5d32c605
[CGP]: Optimize mul.overflow. (#148343)
- Detect cases where LHS & RHS values will not cause overflow
(when the Hi halfs are zero).
2025-11-18 13:15:47 +00:00
David Green
4ecfaa602f
[AArch64][GlobalISel] Add better basic legalization for llround. (#168427)
This adds handling for f16 and f128 lround/llround under LP64 targets,
promoting the f16 where needed and using a libcall for f128. This
codegen is now identical to the selection dag version.
2025-11-18 12:05:02 +00:00
Sander de Smalen
f369a53d82
[DAGCombiner] Fold select into partial.reduce.add operands. (#167857)
This generates more optimal codegen when using partial reductions with
predication.

```
partial_reduce_*mla(acc, sel(p, mul(*ext(a), *ext(b)), splat(0)), splat(1))
-> partial_reduce_*mla(acc, sel(p, a, splat(0)), b)

partial.reduce.*mla(acc, sel(p, *ext(op), splat(0)), splat(1))
-> partial.reduce.*mla(acc, sel(p, op, splat(0)), splat(trunc(1)))
```
2025-11-18 09:49:42 +00:00
Aiden Grossman
472e4ab0b0
[MLGO] Fully Remove MLRegalloc Experimental Features (#168252)
20a22a45e96bc94c3a8295cccc9031bd87552725 was supposed to fully remove
these, but left around the functionality to actually compute them and a
unittest that ensured they worked. These are not development features in
the sense of features used in development mode, but experimental
features that have been superseded by MIR2Vec.
2025-11-17 10:07:48 -08:00
Ryan Cowan
d65be16ab6
[AArch64][GlobalISel] Add combine for build_vector(unmerge, unmerge, undef, undef) (#165539)
This PR adds a new combine to the `post-legalizer-combiner` pass. The
new combine checks for vectors being unmerged and subsequently padded
with `G_IMPLICIT_DEF` values by building a new vector. If such a case is
found, the vector being unmerged is instead just concatenated with a
`G_IMPLICIT_DEF` that is as wide as the vector being unmerged.

This removes unnecessary `mov` instructions in a few places.
2025-11-17 15:55:40 +00:00
David Green
22968f5b4a
[DAG] Add strictfp implicit def reg after metadata. (#168282)
This prevents a machine verifier error, where it "Expected implicit
register after groups".

Fixes #158661
2025-11-17 10:57:21 +00:00
Abinaya Saravanan
c946418330
[MachinePipeliner] Detect a cycle in PHI dependencies early on (#167095)
- This patch detects cycles by phis and bails out if one is found.
- It prevents to violate DAG restrictions.

Abort pipelining in the below case

%1 = phi i32 [ %a, %entry ], [ %3, %loop ]
%2 = phi i32 [ %a, %entry ], [ %1, %loop ]
%3 = phi i32 [ %b, %entry ], [ %2, %loop ]

---------

Co-authored-by: Ryotaro Kasuga <kasuga.ryotaro@fujitsu.com>
2025-11-17 15:28:30 +05:30
pvanhout
853ed3b3b7 [InlineAsmLowering] unsigned -> TypeSize for getTypeStoreSize result 2025-11-17 10:21:43 +01:00
hstk30-hw
51c8180515
[GlobalMerge]Prefer use global-merge-max-offset instead of the target-specific constant offset. (#165591)
In the Dhrystone benchmark, I find some adjacent global not be merged,
on the contrary the GCC's anchor optimize is work. Use
global-merge-max-offset to set the max offset can yield similar results
(still slightly different, at least we can control the offset).
2025-11-17 15:37:51 +08:00
ronlieb
6d5f87fc42
Revert "DAG: Allow select ptr combine for non-0 address spaces" (#168292)
Reverts llvm/llvm-project#167909
2025-11-16 18:35:51 -05:00
Kazu Hirata
98d49d51c0
[CodeGen] Remove a redundant declaration (NFC) (#168285)
EnableFSDiscriminator is declared in DebugInfoMetadata.h.

Identified with readability-redundant-declaration.
2025-11-16 14:06:18 -08:00
Matt Arsenault
dd9bd3e8f0
DAG: Preserve poison in combineConcatVectorOfScalars (#168220) 2025-11-16 11:16:34 -08:00
Sergei Barannikov
97a60aa37a
[CodeGen] Turn MCRegUnit into an enum class (NFC) (#167943)
This changes `MCRegUnit` type from `unsigned` to `enum class : unsigned`
and inserts necessary casts.
The added `MCRegUnitToIndex` functor is used with `SparseSet`,
`SparseMultiSet` and `IndexedMap` in a few places.

`MCRegUnit` is opaque to users, so it didn't seem worth making it a
full-fledged class like `Register`.

Static type checking has detected one issue in
`PrologueEpilogueInserter.cpp`, where `BitVector` created for
`MCRegister` is indexed by both `MCRegister` and `MCRegUnit`.

The number of casts could be reduced by using `IndexedMap` in more
places and/or adding a `BitVector` adaptor, but the number of casts *per
file* is still small and `IndexedMap` has limitations, so it didn't seem
worth the effort.

Pull Request: https://github.com/llvm/llvm-project/pull/167943
2025-11-16 20:46:44 +03:00
Sergei Barannikov
e413343ca7
[SelectionDAG] Verify SDTCisVT and SDTCVecEltisVT constraints (#150125)
Teach `SDNodeInfoEmitter` TableGen backend to process `SDTypeConstraint`
records and emit tables for them. The tables are used by
`SDNodeInfo::verifyNode()` to validate a node being created.

This PR only adds validation code for `SDTCisVT` and `SDTCVecEltisVT`
constraints to keep it smaller.

Pull Request: https://github.com/llvm/llvm-project/pull/150125
2025-11-16 18:26:03 +03:00
AZero13
d831f8df52
[SelectionDAG] Fix AArch64 machine verifier bug when expanding LOOP_DEPENDENCE_MASK (#168221)
TargetConstant nodes don't match TableGen ImmLeaf patterns during
instruction selection. When this zero constant flows into the AArch64
CCMP formation code, the machine verifier hits an assertion in expensive
checks.

Fixes: #168227
2025-11-15 21:12:11 +00:00