575626 Commits

Author SHA1 Message Date
Yeongu Choe
df461c164c
[CIR][CodeGen] Implement __builtin_fpclassify (#187977)
I implemented CIR version of __builtin_fpclassify function.
2026-04-06 14:41:55 -07:00
Amir Ayupov
a8cf1a0352
[BOLT] Allow empty buildid in pre-aggregated profile addresses (#190675)
Allow `parseString()` to return an empty `StringRef` when the delimiter
appears at position 0. This enables parsing pre-aggregated profile
addresses with an omitted buildid but preserved colon (`:addr` format),
where the empty buildid corresponds to the main binary.

Previously, `parseString()` rejected zero-length fields by treating
`StringEnd == 0` the same as `StringRef::npos` (delimiter not found).
These are distinct situations: `npos` means no delimiter exists, while
`0` means the field before the delimiter is empty. The fix removes the
`StringEnd == 0` sub-condition so only the missing-delimiter case
errors.

The existing test for buildid-prefixed addresses is extended to also
verify that `:addr` input produces identical output to the plain-address
and non-empty-buildid variants.

Test Plan:
Added empty-buildid input file and extended
`pre-aggregated-perf-buildid.test` to run perf2bolt with `:addr` format
and diff the fdata output against the existing buildid-prefixed result.
2026-04-06 14:41:21 -07:00
Steven Wu
79e669f000
[CAS] Revert an unintentional change in #190634 (#190686)
Revert an unintentional change in #190634 that did an unintentional
implicit signed to unsigned cast.
2026-04-06 21:37:15 +00:00
Ehsan Amiri
8a11fe97a2
[DA] Require nsw for AddRecs involved in GCD test (#186892)
Similar to other tests, we are adding code that the AddRecs used in GCD
test are `nsw`. In this case, all recursively identified `AddRec`s are
also checked. Note that there is already a similar check in
`getConstantCoefficient` for expressions processed in that function.
2026-04-06 17:33:16 -04:00
Sergei Barannikov
62ce560f68
[lldb] Remove some unreachable code (NFC) (#190529)
`isRISCV()` check always returns false because we only get here if
`min_op_byte_size` and `max_op_byte_size` are equal, which is not true
for RISC-V.
Also, replase `if (!got_op)` check with an `else`. The check is
equivalent to
`if (min_op_byte_size != max_op_byte_size)`, and the `if` above checks
for the opposite condition.
2026-04-07 00:32:17 +03:00
Shilei Tian
ef715849d7
[NFC][AMDGPU] Add some debug prints to SIMemoryLegalizer (#190658) 2026-04-06 17:17:33 -04:00
Jared Hoberock
7087ece044
[MLIR][ExecutionEngine] Tolerate CUDA_ERROR_DEINITIALIZED in mgpuModuleUnload (#190563)
`mgpuModuleUnload` may be called from a global destructor (registered by
`SelectObjectAttr`'s `appendToGlobalDtors`) after the CUDA primary
context has already been destroyed during program shutdown. In this
case, `cuModuleUnload` returns `CUDA_ERROR_DEINITIALIZED`, which is
benign since the module's resources are already freed with the context.

## Reproduction

Any program that uses `gpu.launch_func` and is AOT-compiled (via
`mlir-translate --mlir-to-llvmir | llc | cc -lmlir_cuda_runtime`) will
print `'cuModuleUnload(module)' failed with '<unknown>'` on exit. This
is because `SelectObjectAttr` registers the module unload as a global
destructor, which runs after the CUDA primary context is released.

This script reproduces the error message from `mgpuModuleUnload` on my
system:

```
#!/bin/bash
set -e

LLVM_BUILD=${LLVM_BUILD:-$HOME/dev/git/llvm-project-22/build}

cat > /tmp/repro.mlir << 'MLIR'
func.func @main() {
  %c1 = arith.constant 1 : index
  gpu.launch blocks(%bx, %by, %bz) in (%gx = %c1, %gy = %c1, %gz = %c1)
             threads(%tx, %ty, %tz) in (%bsx = %c1, %bsy = %c1, %bsz = %c1) {
    gpu.terminator
  }
  return
}
MLIR

$LLVM_BUILD/bin/mlir-opt /tmp/repro.mlir \
  -gpu-lower-to-nvvm-pipeline="cubin-format=fatbin" \
  | $LLVM_BUILD/bin/mlir-translate --mlir-to-llvmir -o /tmp/repro.ll

$LLVM_BUILD/bin/llc -relocation-model=pic -filetype=obj /tmp/repro.ll -o /tmp/repro.o

cc /tmp/repro.o \
  -L$LLVM_BUILD/lib -Wl,-rpath,$LLVM_BUILD/lib \
  -lmlir_cuda_runtime -lmlir_runner_utils -o /tmp/repro

echo "Running:"
/tmp/repro 2>&1
echo "Exit code: $?"
```
## Context

This matches how other projects handle the same shutdown ordering issue:
- Clang CUDA (D48613) switched module cleanup from
`__attribute__((destructor))` to `atexit()`
- GCC libgomp checks context validity before `cuModuleUnload`
- Apache TVM silently ignores `CUDA_ERROR_DEINITIALIZED` on module
unload

Fixes #170833
2026-04-06 21:11:58 +00:00
Joe Nash
af95b0a615
[AMDGPU] Remove implicit super-reg defs on mov64 pseudos (#190379)
The mov64 pseudo is split into two 32 bit movs, but those 32 bit movs
had the full 64-bit register still implicitly defined. VOPD formation is
affected, so we can emit more of them.
2026-04-06 21:11:06 +00:00
Jianhui Li
9bddf47198
[MLIR][XeGPU] Extend Wg-to-Sg Distribution of Multi-Reduction Op for round-robin layout (#189988)
This PR enhance the multi-reduction op pattern of wg-to-sg distribution
pass:
1. allows each sg have multiple distribution of sg_data tiles.
2. expand the slm buffer size.
3. construct the layout based on the partial reduced vector and use
layout.computeDistributedCoords() to compute coordinates. the layout is
constructed so that the store is cooperative, and load overlapps with
neighbour threads.
4. perform save and load.
2026-04-06 14:07:50 -07:00
Anshul Nigham
97d50c1490
[NewPM] Adds a port for AArch64PreLegalizerCombiner (#190567)
Standard porting (note that TargetPassConfig dependency was [removed
earlier](e27e7e4339)).

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2026-04-06 14:01:37 -07:00
Andrew
ee51de9836
[llvm-cov] add ability to show non executed test vectors for mc/dc coverage (#187517)
- Added `-show-mcdc-non-executed-vectors` option
- Non-executed test vectors now are tracked
- When the opt is present it's get written to UI
2026-04-06 15:59:14 -05:00
Zile Xiong
d917027334
[llvm-cov] Guard against empty CountedRegions in findMainViewFileID (#189270)
When processing coverage generated from branch coverage mode, some
functions can reach findMainViewFileID with an empty CountedRegions
list. In that case the current logic still proceeds to infer the main
view file, even though there is no regular counted region available to
do so.

Return std::nullopt early when CountedRegions is empty.

This was observed when reproducing issue #189169 with:
  cargo llvm-cov --lib --branch

The issue appears related to branch-only coverage information being
recorded separately in CountedBranchRegions, while
findMainViewFileID currently only consults CountedRegions.
This patch is a defensive fix for the empty-region case; further
investigation may still be needed to determine whether branch regions
should participate in main view file selection.

Co-authored-by: Zile Xiong <xiongzile99@gmail.com>
2026-04-06 15:58:31 -05:00
Chinmay Deshpande
9033e872fd
[AMDGPU][GISel] RegBankLegalize rules for update_dpp (#190662) 2026-04-06 13:52:10 -07:00
Alexis Engelke
89665812f5
[Analysis][NFC] Use block numbers in BlockFrequencyInfo (#190669)
Block pointers are only stored while constructing the analysis, so the
value handle to catch erased blocks is no longer needed when using
stable block numbers.
2026-04-06 20:47:34 +00:00
Valentin Clement (バレンタイン クレメン)
92b595b9b4
[flang][cuda] Take associate into account for host array diagnostic (#190673) 2026-04-06 20:43:52 +00:00
Congzhe
fbe6d79465
[LoopFusion] Fix out-of-date LoopInfo being used during fusion (#189452)
This is fix for
[187902](https://github.com/llvm/llvm-project/issues/187902), where
`LoopInfo` is not in a valid state at the beginning of `ScalarEvolution::createSCEVIter`.

The reason for the bug is that, `mergeLatch()` is called at a place
where control flow and dominator trees have been updated but `LoopInfo`
has not completed the update yet. `mergeLatch()` calls into
`ScalarEvolution` that uses `LoopInfo`, where out-of-date `LoopInfo` would
result in crash or unpredictable results.

This patch moves `mergeLatch()` to the place where `LoopInfo` has
completed its update and hence is in a valid state.
2026-04-06 16:35:28 -04:00
Steven Wu
1a0ca1019d
[CAS] Harden validate() against on-disk corruption (#190634)
Fixes found by fuzzer:

OnDiskTrieRawHashMap:
- Bounds-check data slot offsets in TrieVerifier::visitSlot() before
  calling getRecord(), preventing asData() assertion on out-of-bounds
  trie entries.
- Validate subtrie headers (NumBits, bounds) before constructing
  SubtrieHandle, preventing SEGV in getSlots() from corrupt NumBits.
- Validate arena bump pointer alignment, catching misaligned BumpPtr
  that would crash store() with an alignment assertion.
- Fix comma operator bug in getOrCreateRoot() where the
  compare_exchange_strong result was discarded, causing asSubtrie()
  assertion when RootTrieOffset was corrupted to zero.

OnDiskGraphDB:
- Reject invalid (zero) ref offsets in validate callback, preventing
  asData() assertion when corrupt data pool refs are resolved via
  recoverFromFileOffset().
- Validate DataRecordHandle layout flags before calling getTotalSize(),
  preventing llvm_unreachable on corrupt NumRefsFlags/DataSizeFlags.
- Validate data pool bump pointer alignment, catching misaligned
  BumpPtr that would crash store() in DataRecordHandle::constructImpl().
- Check data record refs offset alignment before calling getRefs(),
  preventing PointerUnion assertion from misaligned refs pointer.

MappedFileRegionArena:
- Convert assertions in initializeHeader() to errors so corrupted
  arena headers return an error on CAS open instead of crashing.

Assisted-By: Claude
2026-04-06 13:33:22 -07:00
Arthur Eubanks
70d3dcaa64
Revert "[Inliner] Put inline history into IR as !inline_history metadata" (#190666)
Reverts llvm/llvm-project#190092

Crashes reported in
https://github.com/llvm/llvm-project/pull/190092#issuecomment-4194546908
2026-04-06 20:31:54 +00:00
Steven Wu
40d3949162
[CAS] Add llvm-cas-fuzzer for ObjectStore::validate() (#190635)
Add a fuzzer that creates an on-disk CAS database, stores objects, then
corrupts the on-disk data files using fuzzer-provided bytes and calls
validate(). The goal is that validate() should either succeed or return
an error, never crash.

The fuzzer supports 6 corruption modes: byte-level mutations, file
truncation, appending garbage, zeroing ranges, standalone file
corruption, and combined mutations with continued CAS operations.

Assisted-By: Claude
2026-04-06 13:31:51 -07:00
Jonas Devlieghere
950f1de70b
[lldb] Fix UUID thombstone Key (#190551)
This changes `DenseMapInfo<UUID>::getTombstoneKey()` to return a 1-byte
`{0xFF}` sentinel instead of the empty, default constructed UUID().
Returning the same key for the empty and tombstone value apparently
violates the `DenseMap` invariant.
2026-04-06 13:25:34 -07:00
Brian Cain
2aa4100fa7
[compiler-rt] Add hexagon to libFuzzer supported architectures (#190297)
LibFuzzer builds successfully for Hexagon Linux.
2026-04-06 14:49:43 -05:00
Chinmay Deshpande
40d5a7d69e
[AMDGPU][UniformityAnalysis] Mark set_inactive and set_inactive_chain_arg as SourceOfDivergence (#190640)
`set_inactive` produces a result that varies per-lane based on the EXEC mask, even when both inputs are uniform.
2026-04-06 12:40:22 -07:00
Aadarsh Keshri
326593b4b4
[Support][Modules] Removed prepareForGetLock and its usages. Ensured parent directory exists when creating lock file. (#189888)
Following #187372
2026-04-06 12:37:32 -07:00
Lucas Ramirez
5e1162eebc
[CodeGen] Move rollback capabilities outside of the rematerializer (#184341)
The rematerializer implements support for rolling back
rematerializations by modifying MIs that should normally be deleted in
an attempt to make them "transparent" to other analyses. This involves:

1. setting their opcode to DBG_VALUE and
2. setting their read register operands to the sentinel register.

This approach has several drawbacks.

1. It forces the rematerializer to support tracking these "dead MIs"
(even if support is optional, these data-structures have to exist).
2. It is not actually clear whether this mechanism will interact well
with all other analyses. This is an issue since the intent of the
rematerializer is to be usable in as many contexts as possible.
3. In practice, it has shown itself to be relatively error-prone.

This commit removes rollback support from the rematerializer and moves
those capabilities to a rematerializer listener than can be instantiated
on-demand and implements the same functionality on top of standard
rematerializer operations. The rematerializer now actually deletes MIs
that are no longer useful after rematerializations, and has support for
re-creating them on-demand without requiring additional tracking on its
part.
2026-04-06 19:23:19 +00:00
Nerixyz
a2c9146da1
[lldb][NativePDB] Handle S_DEFRANGE_REGISTER_REL_INDIR (#190336)
Since #189401, LLVM and Clang generate `S_DEFRANGE_REGISTER_REL_INDIR`
for indirect locations. This adds support in LLDB.

The offset added after dereferencing is signed here - unlike in
`S_REGREL32_INDIR` (at least that's the assumption). So I updated
`MakeRegisterBasedIndirectLocationExpressionInternal` to handle the
signedness. This is the reason the MSVC test was changed here.

I didn't find a test case where LLVM emits the record with the `VFRAME`
register. Other than that, the clang test is similar to the MSVC one
except that the locations are slightly different.
2026-04-06 21:21:47 +02:00
Daniel Thornburgh
fecf609998
Reland "[LTO][LLD] Prevent invalid LTO libfunc transforms (#164916)" (#190642)
This reverts commit 1ec7e86b3a779df2a0af3f37e58c8f5b3a398d7f after issue
#190072 was fixed.
2026-04-06 19:20:45 +00:00
Henry Jiang
412d6941e3
[VFS] Guard against null key/value nodes when parsing YAML overlay (#190506)
When a VFS overlay YAML file contains malformed content such as tabs,
the YAML parser can produce KeyValueNode entries where `getKey` returns
nullptr. The VFS overlay parser then passes the nullptr to
`parseScalarString`, which then calls dyn_cast.

Switch to `dyn_cast_if_present` for the above callsites and a few more.
2026-04-06 12:10:26 -07:00
Keith Smiley
04e2be73a6
[bazel] Fix TestingSupport layering_check (#190630)
I'm not sure if this header is public API upstream but we are using it
that way anyways.
2026-04-06 12:03:45 -07:00
Brian Cain
ab43cb8520
[Hexagon] Pass -pie to linker when PIE is the toolchain default (#189723)
The Hexagon driver only checked for an explicit -pie flag when
constructing the link command, ignoring the toolchain's PIE default. For
linux-musl targets, isPIEDefault() returns true (via the Linux toolchain
base class), so the compiler generates PIC/PIE code (-pic-level 2
-pic-is-pie) but the linker never received -pie.

This mismatch caused LTO failures: without -pie the linker sets
Reloc::Static for the LTO backend, which generates GP-relative
(small-data) references that lld cannot resolve.

Use hasFlag() to respect the toolchain default, and guard the -pie
emission against -shared and -r (relocatable) modes.
2026-04-06 13:58:43 -05:00
Stanislav Mekhanoshin
de0a81091b
[AMDGPU] Update vop3-literal.s to use fake16 on gfx1250. NFC (#190243)
16-bit instructions there are in fake16 mode and shall also be
compatible with older targets. The purpose of the test is to
check literals, so fake16 or real16 is not important.
2026-04-06 11:50:15 -07:00
Alexis Engelke
a105f27f61
[Scheduler][NFC] Don't use set to track visited nodes (#190480)
The visited set can grow rather large and we can use an unused field in
SDNode to store the same information without the use of a hash set.

This improves compile times: stage2-O3 -0.14%.
2026-04-06 18:37:26 +00:00
Kirill Stoimenov
cdbb1f5014
Revert "[InstCombine] Fix #163110: Support peeling off matching shifts from icmp operands via canEvaluateShifted" (#190638)
Reverts llvm/llvm-project#165975

Breaks Sanitizer bots:
https://lab.llvm.org/buildbot/#/builders/52/builds/16329
2026-04-06 11:30:36 -07:00
vporpo
8d442bc5b5
[SandboxVec][LoadStoreVec] Add support for constants (#189769)
Up until now the pass would only vectorize load-store pairs. This patch
implements vectorization of constant-store pairs.
2026-04-06 11:25:20 -07:00
neonetizen
e11a31f4c7
[CIR][AArch64] Lower FP16 vduph lane intrinsics (#186955)
From #185382 

Lower `vduph_lane_f16` and `vduph_laneq_f16` to `cir::VecExtractOp`

Tests moved from `v8.2a-neon-instrinsics-generic.c` to a new CIR-enabled
test file.

I tried following from notes made in #185852 (BF16)
2026-04-06 19:12:34 +01:00
SiliconA-Z
5c13d2f099
[ARM] Enable creation of ARMISD::CMN nodes (#163223)
Map ARMISD::CMN to tCMN instead of armcmpz.

Rename the cmn instructions to match this new reality.

Please note that I do not have merge permissions.
2026-04-06 20:05:14 +02:00
Craig Topper
38034d42bd
[RISCV] Use EVT instead of MVT in compressShuffleOfShuffles. (#190636)
For the test case I just grabbed a test that exercised this code path
and made the VT non-simple.

Fixes #190605.
2026-04-06 11:03:38 -07:00
Chinmay Deshpande
12e957fd7f
[AMDGPU][GISel] RegBankLegalize rules for amdgcn_inverse_ballot (#190629) 2026-04-06 10:30:35 -07:00
Tomer Shafir
37801e9e99
[MCA] Enhance debug prints of processor resources (#190132)
Previously, `computeProcResourceMasks()` would print resource masks on
debug mode from multiple call sites, creating noise in the debug output.
This patch aims to fix this and also print more info about the
resources.

It splits to 2 types of debug prints for resources:

1. No simulation - mask only
2. Simulation - mask + other info

For 2, it shares printing on a single place in `ResourceManager`
constructor, that should cover all the other simulation cases
indirectly:

1. `llvm/lib/MCA/HardwareUnits/ResourceManager` - covered
2. `llvm/lib/MCA/InstrBuilder.c` - should be covered indirectly - only
used by `llvm-mca` before simulation that constructs a `ResourceManager`
3. `llvm/tools/llvm-mca/Views/SummaryView.cpp` - after simulation that
constructs a `ResourceManager`
4. `llvm/tools/llvm-mca/Views/BottleneckAnalysis.cpp` - after simulation
that constructs a `ResourceManager`

It also adds `BufferSize` to the output, which should be useful to debug
scheduling model + MCA integration.

For 1, it inlines mask-only printing into 2 other callers:

1. `llvm/include/llvm/MCA/Stages/InstructionTables.h`
2. `llvm/tools/llvm-exegesis/lib/SchedClassResolution.cpp`

as they only use the masks there. I think this is a reasonable
duplication across distinguishably different users/tools.

Now every pair of callers, even across groups (1 and 2), effectively
print in a mutually exclusive way.

The patch adds debug tests for the 3 new callers, in the corresponding
root test directories, to drive further location of logically
target-independent tests that just require some target at the root. I
think this convention is more discoverable, and is pretty widely used in
the project.
2026-04-06 20:27:18 +03:00
Arthur Eubanks
72d4ce9889
[Inliner] Put inline history into IR as !inline_history metadata (#190092)
So that it's preserved across all inline invocations rather than just
one inliner pass run.

This prevents cases where devirtualization in the simplification
pipeline uncovers inlining opportunities that should be discarded due to
inline history, but we dropped the inline history between inliner pass
runs, causing code size to blow up, sometimes exponentially.

For compile time reasons, we want to limit this to only call sites that
have the potential to inline through SCCs, potentially with the help of
devirtualization. This means that the callee is in a non-trivial
(Ref)SCC, or the call site was previously an indirect call, which can
potentially be devirtualized to call any function.

The CGSCCUpdater::InlinedInternalEdges logic still seems to be relevant
even with this change, as monster_scc.ll blows up if I remove that code.


http://llvm-compile-time-tracker.com/compare.php?from=e830d88e8ae5f44a97cc76136a0a4e83aa9157c0&to=ed535e732fc41b79ab8efda2417886cbd0812f7f&stat=instructions:u

Fixes #186926.
2026-04-06 10:24:41 -07:00
vangthao95
eb065bf028
AMDGPU/GlobalISel: RegBankLegalize rules for G_EXTRACT_VECTOR_ELT (#189144) 2026-04-06 10:22:11 -07:00
Andrzej Warzyński
38c53b3eb9
[clang][cir][nfc] Fix comments, add missing EOF (#190623) 2026-04-06 18:06:57 +01:00
Craig Topper
b44d2c977c
[RISCV] Use a vector MemVT when converting store+extractelt into a vector store. (#190107)
This is needed so that `allowsMemoryAccessForAlignment` checks for
unaligned vector memory
support instead of unaligned scalar memory support when called from
`RISCVTargetLowering::expandUnalignedVPStore`

While there remove incorrect setting of the truncating store flag
on the vector instruction. And restrict the transform to simple stores
since we don't have tests for volatile or atomic.

Fixes #189037
2026-04-06 09:58:04 -07:00
Craig Topper
0d14772a91
[RISCV][P-ext] Add isel patterns for for macc*.h00/macc*.w00. (#190444)
The RV32 macc*.h00 instructions take the lower half words from rs1 and
rs2, compute the full word product by extending the inputs, and
add to rd. The RV64 macc*.w00 is similar but operates on words
and produces a double word result.

I've restricted this to case where the multiply has a single use.
We don't have a general macc that multiplies the full xlen bits
of rs1 and rs2, so I'm allowing the input to be sext_inreg/and or
have sufficient sign/zero bits according to
ComputeNumSignBits/computeKnownBits.

We should also add mul*.h00/mul.*w00 patterns, but those we should
restrict to at least one input being sext_inreg/and and prefer
regular mul when there are no sext_inreg/and.
2026-04-06 09:57:29 -07:00
Wooseok Lee
0bef4c7aab
[AMDGPU] Add v2i32 and/or patterns for VOP3 AND_OR and OR3 operations (#188375)
Add ThreeOp_v2i32_Pats pattern class to support v2i32 vector operations
for AND_OR_B32 and OR3_B32 instructions. The new patterns check the
v2i32 and-or or or-or instruction sequence, extract individual 32-bit
elements from v2i32 operands, and applies the and_or or or3 vop3
operations.
2026-04-06 16:54:21 +00:00
Domenic Nutile
5b33f85a08
[AMDGPU] Change isSingleLaneExecution to account for WWM enabling lanes even if there's only one workitem (#188316)
This issue was discovered during some downstream work around Vulkan CTS
tests, specifically
`dEQP-VK.subgroups.arithmetic.compute.subgroupadd_float`

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2026-04-06 12:51:46 -04:00
forking-google-bazel-bot[bot]
e7ac60c56b
[Bazel] Fixes ce1a9fd (#190577)
This fixes ce1a9fd76640929fe340c5c5d1bb493ea09ca9bc.

Co-authored-by: Google Bazel Bot <google-bazel-bot@google.com>
2026-04-06 09:40:22 -07:00
Valentin Clement (バレンタイン クレメン)
baa1e5008b
[flang][cuda] Do not consider kernel result as host variable (#190626) 2026-04-06 16:39:38 +00:00
adams381
9265f9284c
[mlir][ABI] Add writable, dead_on_unwind, dead_on_return, nofpclass param attrs to LLVM dialect (#188374)
The MLIR LLVM dialect is missing support for several parameter
attributes that
exist in LLVM IR: `writable`, `dead_on_unwind`, `dead_on_return`, and
`nofpclass`. This adds them to the kind-to-name mapping in
`AttrKindDetail.h`
and the corresponding name accessors in `LLVMDialect.td`.

The existing generic conversion infrastructure in `ModuleTranslation`
and
`ModuleImport` picks them up automatically — `writable` and
`dead_on_unwind`
round-trip as `UnitAttr`, while `dead_on_return` and `nofpclass`
round-trip as
`IntegerAttr`.

CIR needs these to match classic codegen's ABI output (sret gets
`writable
dead_on_unwind`, indirect args get `dead_on_return`, fast-math FP args
get
`nofpclass`).
2026-04-06 11:26:11 -05:00
Henrich Lauko
348295ac05
[CIR] Use data size in emitAggregateCopy for overlapping copies (#186702)
Add skip_tail_padding property to cir.copy to handle
potentially-overlapping
subobject copies directly, instead of falling back to cir.libc.memcpy.
When
set, the lowering uses the record's data size (excluding tail padding)
for
the memcpy length. This keeps typed semantics and promotability of
cir.copy.

Also fix CXXABILowering to preserve op properties when recreating
operations,
and expose RecordType::computeStructDataSize() for computing data size
of
padded record types.
2026-04-06 18:24:10 +02:00
Eric Feng
930ef7736e
[mlir][amdgpu] Add optional write mask to amdgpu.global_load_async_to_lds (#190498) 2026-04-06 09:21:32 -07:00