547939 Commits

Author SHA1 Message Date
Simon Pilgrim
b83f7f195c
[Headers][X86] Update SSE/AVX and/andnot/or/xor intrinsics to be used in constexpr (#152305) 2025-08-07 08:16:26 +01:00
Matt Arsenault
f44d8d583c
AMDGPU: Add a few missing mfma rewrite tests (#152434)
Test other splitting situations that appear in greedy.
This includes ensuring we have a case that hits a local split
and instruction split (most of the tests hit the region split path).

Also test a few cases where the final result isn't fully used, resulting
in partial copy bundles instead of a simple full copy. Test physreg
and virtreg agpr interference with a reassignment candidate.

I'm accumulating too many failure cases, and MIR tests are very prone
to painful merge conflicts, so I've added a few more tests and extracted
new tests from #147975.

Closes #149026
2025-08-07 16:14:45 +09:00
Matthias Springer
a3e0685529
[mlir][Transforms] More detailed error message when new IR cannot be legalized (#152297)
Print a more detailed error message when new/modified IR could not be
legalized with `allowPatternRollback = false`. This is useful to
understand why a pattern is incompatible with the new One-Shot Dialect
Conversion driver.

---------

Co-authored-by: Jeremy Kun <jkun@google.com>
2025-08-07 09:14:24 +02:00
Matt Arsenault
1110e2ff9f
InlineFunction: Split inlining into predicate and apply functions (#134213)
This is to support a new inline function reduction in llvm-reduce,
which should pre-filter callsites that are not eligible for inlining.

This code was mostly structured as a match and apply, with a few
exceptions. The ugliest piece is for propagating and verifying
compatible
getGC and personalities. Also collection of EHPad and the convergence
token
to use are now cached in InlineFunctionInfo.

I was initially confused by the split between the checks performed here
and isInlineViable, so better document how this system is supposed to
work.
It turns out this split does make sense, in that isInlineViable checks
if it's possible based on the callee content and the ultimate inline
depended on the callsite context. I think more renames of these
functions
would help, and isInlineViable should probably move out of InlineCost to
be
with these transfoms.
2025-08-07 16:13:36 +09:00
Nikita Popov
406d9b1dd6
[CodeGen] Move IsFixed into ArgFlags (NFCI) (#152319)
The information whether a specific argument is vararg or fixed is
currently stored separately from all the other argument information in
ArgFlags. This means that it is not accessible from CCAssign, and
backends have developed all kinds of workarounds for how they can access
it after all.

Move this information to ArgFlags to make it directly available in all
relevant places.

I've opted to invert this and store it as IsVarArg, as I think that both
makes the meaning more obvious and provides for a better default (which
is IsVarArg=false).
2025-08-07 09:12:40 +02:00
Simon Pilgrim
edad89e4e0
[Headers][X86] Update MMX arithmetic intrinsics to be used in constexpr (#152296)
Update the easy add/sub/mul/logic/cmp/scalar_to_vector intrinsics to be
constexpr compatible.

I'm not expecting anyone to be very interested in using MMX intrinsics,
but they're smaller than the other types and are useful to test the
constexpr handling and test methods before we start applying them to
SSE/AVX2/AVX512 intrinsics.
2025-08-07 08:05:05 +01:00
Florian Hahn
a485e0eae0
[VPlan] Retrieve vector TC for epilogue from resume phi (NFC).
Instead of relying on getOrCreateVectorTripCount to initialize
EPI.VectorTripCount, delay initialization after we retrieved the resume
phi and get the trip count from there. This makes the code independent
of legacy vector trip count creation.
2025-08-07 07:52:35 +01:00
Matthias Springer
71832a3139
[mlir][Transforms] Make lookup without type converter unambiguous (#151747)
When a conversion pattern is initialized without a type converter, the
driver implementation currently looks up the most recently mapped value.
This is undesirable because the most recently mapped value could be a
materialization. I.e., the type of the value being looked up could
depend on which other patterns have run before. Such an implementation
makes the type conversion infrastructure fragile and unpredictable.

The current implementation also contradicts the documentation in the
markdown file. According to that documentation, the values provided by
the adaptor should match the types of the operands of the match
operation when running without a type converter. This mechanism is not
desirable, either, for two reasons:

1. Some patterns have started to rely on receiving the most recently
mapped value. Changing the behavior to the documented behavior will
cause regressions. (And there would be no easy way to fix those without
forcing the use of a type converter or extending the `getRemappedValue`
API.)
2. It is more useful to receive the most recently mapped value. A value
of the original operand type can be retrieved by using the operand of
the matched operation. The adaptor is not needed at all in that case.

To implement the new behavior, materializations are now annotated with a
marker attribute. The marker is needed because not all
`unrealized_conversion_cast` ops are materializations that act as "pure
type conversions". E.g., when erasing an operation, its results are
mapped to newly-created "out-of-thin-air values", which are
materializations (with no input) that should be treated like regular
replacement values during a lookup. This marker-based lookup strategy is
also compatible with the One-Shot Dialect Conversion implementation
strategy, which does not utilize the mapping infrastructure anymore and
queries all necessary information by examining the IR.
2025-08-07 08:41:28 +02:00
Matthias Springer
0a72e6ddac
[mlir][Transforms] ConversionPatternRewriter: Add config getter (#152310)
Add a helper function to `ConversionPatternRewriter` that returns the
dialect conversion configuration. This flag is useful when migrating
conversion patterns to the new One-Shot Conversion Driver: patterns can
check if they are running in rollback mode or not. They can then work
around API changes and makes sure that the pattern keeps working with
both the old and new driver.

Also remove the `config` field from `OperationLegalizer`. That field was
never needed.
2025-08-07 08:33:24 +02:00
Ziqing Luo
0abf4975bb
[-Wunsafe-buffer-usage] Do not warn about class methods with libc function names (#151270)
This commit fixes the false positive that C++ class methods with libc
function names would be false warned about. For example,

```
struct T {void strcpy() const;};
void test(const T& t) {  str.strcpy(); // no warn }
```

rdar://156264388
2025-08-07 14:31:13 +08:00
Madhur Amilkanthwar
13daf3b70c
[GVN-PRE][Tests] Add MSSA coverage to some more tests [4/N] (#151919)
This should be the final PR for tests under PRE.
2025-08-07 11:16:07 +05:30
Valentin Clement (バレンタイン クレメン)
35f003d13b
[flang][cuda] Fix buildbot after #152418 (#152437) 2025-08-06 22:24:35 -07:00
Valentin Clement (バレンタイン クレメン)
eb0ddba26b
Reland "[flang][cuda] Set the allocator of derived type component after allocation" (#152418)
Reviewed in #152379
- Move the allocator index set up after the allocate statement otherwise
the derived type descriptor is not allocated.
- Support array of derived-type with device component
2025-08-06 21:49:55 -07:00
Princeton Ferro
9a592d9a84
[NVPTX] lower VECREDUCE min/max to 3-input on sm_100+ (#136253)
Add support for 3-input fmaxnum/fminnum/fmaximum/fminimum introduced in
PTX 8.8 for sm_100+:
- Use a tree reduction when 3-input operations are supported and the
  reduction has the `reassoc` flag.
- If not on sm_100+/PTX 8.8, fallback to 2-input operations and use the
  default shuffle reduction.
2025-08-06 21:45:21 -07:00
Luke Lau
a04142f11f [LV][RISCV] Add check lines for scalable interleave costs. NFC
Previously we could only scalably vectorize interleave groups with
factor 2, but after 7ef77eb9984d1fb537a409cf4be89560fbb681fe we now
support all factors (available on RISC-V). So this adds the remaining
check lines for the scalable VFs.
2025-08-07 12:28:12 +08:00
Sharjeel Khan
d9f9064cfa
[ubsan_minimal] Add address argument to Android's abort message function (#152419)
https://github.com/llvm/llvm-project/pull/152192 forgot to make the
argument changes to Android code in UBsan minimal causing a build error
for Android LLVM:
```
/b/f/w/src/git/out/llvm-project/compiler-rt/lib/ubsan_minimal/ubsan_minimal_handlers.cpp:102:3: error: no matching function for call to 'format_msg'
  102 |   format_msg(kind, caller, msg_buf, msg_buf + sizeof(msg_buf));
      |   ^~~~~~~~~~
/b/f/w/src/git/out/llvm-project/compiler-rt/lib/ubsan_minimal/ubsan_minimal_handlers.cpp:37:13:
note: candidate function not viable: requires 5 arguments, but 4 were
provided
   37 | static void format_msg(const char *kind, uintptr_t caller,
      |             ^          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   38 |                        const uintptr_t *address, char *buf,
const char *end) {
```
This change adds the address argument to abort_with_message just like
__ubsan_report_error_fatal so it can be passed to format_msg.
2025-08-06 20:36:21 -07:00
Luke Lau
44af26ea2e [LV] Fix EVL test after merge. NFC
Test was modified in both 25d1285eecbab731eaf418c8aab44e4eb5f9e538 and
df8da2ff8370fda479b5c118704af4f50e0d3536
2025-08-07 11:12:43 +08:00
Vitaly Buka
0168324523
[CI] Test compiler-rt when it's changed (#152425) 2025-08-06 20:02:48 -07:00
Luke Lau
df8da2ff83
[VPlan] Support VPWidenPointerInductionRecipes with EVL tail folding (#152110)
Now that VPWidenPointerInductionRecipes are modelled in VPlan in
#148274, we can support them in EVL tail folding.

We need to replace their VFxUF operand with EVL as the increment is not
guaranteed to always be VF on the penultimate iteration, and UF is
always 1 with EVL tail folding.

We also need to move the creation of the backedge value to the latch so
that EVL dominates it.

With this we will no longer fail to convert a VPlan to EVL tail folding,
so adjust tryAddExplicitVectorLength to account for this. This brings us
to 99.4% of all vector loops vectorized on SPEC CPU 2017 with tail
folding vs no tail folding.

The test in only-compute-cost-for-vplan-vfs.ll previously relied on
widened pointer inductions with EVL tail folding to end up in a scenario
with no vector VPlans, so this also replaces it with an unvectorizable
fixed-order recurrence test from
first-order-recurrence-multiply-recurrences.ll that also gets discarded.
2025-08-07 10:54:24 +08:00
Valentin Clement (バレンタイン クレメン)
a196281896
[flang][cuda] Remove meaningless warning on CUDA shared arguments (#152404)
The warning in issued during the compatibility check makes little sense.
Just remove it as it is confusing.
2025-08-06 18:50:07 -07:00
Valentin Clement (バレンタイン クレメン)
2696e8c149
[flang][cuda] Remove too restrictive assert for data transfer (#152398)
When the rhs is a an array element, the assert was triggered but this is
still a valid transfer. Remove the assert. The operation has a verifier
to check its validity.
2025-08-06 18:49:52 -07:00
Jorge Gorbe Moya
8381f95dec
[bazel] Fix mlir/tests after 281e6d2cc498d05f3ca601e3b1d595420e7ed827 (#152413) 2025-08-06 20:36:55 -05:00
Farzon Lotfi
04672e20d4
[DirectX] ForwardHandle needs to check if globals were stored on allocas (#151751)
fixes #140819

SROA pass is making it so that some globals get loaded into stack
allocations. This means we find an alloca where we use to expect a load
and now need to walk an alloca -> store -> maybe load chain before we
find the global. Doing so fixes All but two instances of #137715 And
fixes every instance of `Load of "8.sroa.0" is not a global resource
handle we are currently seeing in the DML shaders.
2025-08-06 21:14:35 -04:00
Thurston Dang
01472d8e35
[NFC][asan] Update shadow mapping comments for AArch64 non-Android Linux (#152412)
This adds commentary to explain why ASan does not work for AArch64
non-Android Linux with 39-bit and 42-bit VMAs (e.g.,
https://github.com/llvm/llvm-project/issues/145259).

Additionally, it updates the 42-bit VMA shadow map comment, which has
been outdated for the last 10 years
(18b2258c92df93c83bc7fce94c20baff3c06e2c6 changed 39-bit and 42-bit to
use the same offset), and adds a comment for the 48-bit VMA shadow map.
2025-08-06 18:06:05 -07:00
Craig Topper
886b2133e3
[RISCV] Relax one of the zexti8 in the PACKH+PACK(W)/SLLI patterns. (#152384)
For RV32 we don't need the byte shifted by 24 to be zero extend
since the extended bits are shifted out.
For RV64, we don't need the byte shifted by 24 to be zero extended
if the upper 32 bits of the result aren't demanded.
2025-08-06 17:46:43 -07:00
Wenju He
3d1c1a5277
[libclc] Set TARGET_FILE property for prepare-${obj_suffix} target (#152245)
The target's output bitcode `libclc_builtins_lib` is located in a
sub-directory in clang resource directory since df7473673214. Setting
TARGET_FILE property can allow targets in non-libclc project to obtain
the path to `libclc_builtins_lib`.
2025-08-07 08:28:43 +08:00
Daniel Paoliello
7694856fdd
Fix TargetParserTests for big-endian hosts (#152407)
The new `sys::detail::getHostCPUNameForARM` for Windows (#151596) was
implemented using a C++ bit-field, which caused the associated unit
tests to fail on big-endian machines as it assumed a little-endian
layout.

This change switches from the C++ bit-field to LLVM's `BitField` type
instead.
2025-08-06 16:50:28 -07:00
Finn Plummer
acb5d0c211
[NFC][HLSL] Replace uses of getResourceName/printEnum (#152211)
Introduce the `enumToStringRef` enum into `ScopedPrinter.h` that
replicates `enumToString` behaviour, expect that instead of returning a
hex value string, it just returns an empty string. This allows us to
return a StringRef and easily check if an invalid enum was provided
based on the StringRef size

This then uses `enumToStringRef` to remove the redundant
`getResourceName` and `printEnum` functions.

Resolves: https://github.com/llvm/llvm-project/issues/151200.
2025-08-06 16:35:16 -07:00
Uzair Nawaz
c4846d29cd
[libc] Move CharacterConverter template specialization to cpp file (#152405)
Fixes build errors caused by #152204
2025-08-06 23:32:23 +00:00
Florian Mayer
a7f1702f2c
[NFC] [CFI] correct comment in test (#152399)
It incorrectly stated that `const char*` gets normalized to ptr, while
it should say that `char*` does.
2025-08-06 16:07:40 -07:00
Valentin Clement (バレンタイン クレメン)
7d3134f6cc
Revert "[flang][cuda] Set the allocator of derived type component after allocation" (#152402)
Reverts llvm/llvm-project#152379

Buildbot failure
https://lab.llvm.org/buildbot/#/builders/207/builds/4905
2025-08-06 15:55:53 -07:00
Uzair Nawaz
e83abd774a
[libc] Template StringConverter pop function to avoid duplicate code (#152204)
Addressed TODO to template the StringConverter pop functions to have a
single implementation (combine popUTF8 and popUTF32 into a single
templated pop function)
2025-08-06 15:46:41 -07:00
Qiongsi Wu
09dbdf6514
[clang][Dependency Scanning] Move Module Timestamp Update After Compilation Finishes (#151774)
When two threads are accessing the same `pcm`, it is possible that the
reading thread sees the timestamp update, while the file on disk is not
updated.

This PR moves timestamp update from `writeAST` to
`compileModuleAndReadASTImpl`, so we only update the timestamp after the
file has been committed to disk.

rdar://152097193
2025-08-06 15:39:37 -07:00
Stanislav Mekhanoshin
b296ea9c14
[AMDGPU] s_get_shader_cycles_u64 gfx1250 instruction (#152390)
It is the same as reading SHADER_CYCLES_LO and SHADER_CYCLES_HI
but with a single instruction.
2025-08-06 15:32:28 -07:00
Andrew Lazarev
f61526971f
Revert "[WebAssembly] Constant fold wasm.dot" (#152382)
Reverts llvm/llvm-project#149619

It breaks ubsan bot:
https://lab.llvm.org/buildbot/#/builders/25/builds/10523

Earlier today the failure was hidden by another breakage that is fixed
now.
2025-08-06 15:16:19 -07:00
Valentin Clement (バレンタイン クレメン)
d897355876
[flang][cuda] Set the allocator of derived type component after allocation (#152379)
- Move the allocator index set up after the allocate statement otherwise
the derived type descriptor is not allocated.
- Support array of derived-type with device component
2025-08-06 15:14:00 -07:00
lntue
885ddf4a3a
[libc] Fix constexpr FPUtils rounding_mode.h functions. (#152342) 2025-08-06 22:05:12 +00:00
Aiden Grossman
d54aa36146
[CI] Refactor monolithic-* scripts to use common utils.sh
This patch refactors big chunks of the common functionality shared
between monolithic-linux.sh and monolithic-windows.sh to a separate
script, utils.sh, that is then sourced in both of the files. This makes
it a bit easier to maintain the scripts.

Platform differences should not be a large deal for the setup here as we
are using bash as the shell on both Linux and Windows.

Reviewers:
lnihlen, gburgessiv, Keenuts, DavidSpickett, dschuff, cmtice, Endilll

Reviewed By: DavidSpickett, cmtice

Pull Request: https://github.com/llvm/llvm-project/pull/152199
2025-08-06 15:00:51 -07:00
Md Abdullah Shahneous Bari
281e6d2cc4
[mlir][ExecutionEngine] Add LevelZeroRuntimeWrapper. (#151038)
Adds LevelZeroRuntime wrapper and tests.

Co-authored-by: Artem Kroviakov <artem.kroviakov@intel.com>
Co-authored-by: Nishant Patel <nishant.b.patel@intel.com>

---------

Co-authored-by: Artem Kroviakov <artem.kroviakov@intel.com>
Co-authored-by: Nishant Patel <nishant.b.patel@intel.com>
2025-08-06 16:48:59 -05:00
Stanislav Mekhanoshin
66392a8d8d
[AMDGPU] Add XNACK_STATE_PRIV and _MASK gfx1250 registers (#152374)
Co-authored-by: Pierre Vanhoutryve <pierre.vanhoutryve@amd.com>

Co-authored-by: Pierre Vanhoutryve <pierre.vanhoutryve@amd.com>
2025-08-06 14:44:17 -07:00
hidekisaito
83e5a99ff6
[AMDGPU][Offload] Enable memory manager use for up to ~3GB allocation size in omp_target_alloc (#151882)
Enables AMD data center class GPUs to use memory manager memory pooling
up to 3GB allocation by default, up from the "1 << 13" threshold that
all plugin-nextgen devices use.
2025-08-06 14:41:20 -07:00
Stanislav Mekhanoshin
c3103068b7
[AMDGPU] Add more gfx1250 MC tests. NFC. (#152388)
These are already working, but left downstream.
2025-08-06 14:38:28 -07:00
Jonas Devlieghere
87404eaf04
[lldb] Fix undefined behavior in DWARFExpressionTest
RegisterInfo is a trivial class and doesn't default initialize its
members. Thanks Alex for getting to the bottom of this.
2025-08-06 14:32:41 -07:00
Stanislav Mekhanoshin
184821b63d
[AMDGPU] Add gfx1250 DS MC tests. NFC. (#152378) 2025-08-06 14:15:35 -07:00
Shilei Tian
351b38f266
[AMDGPU] Mark address space cast from private to flat as divergent if target supports globally addressable scratch (#152376)
Globally addressable scratch is a new feature introduced in gfx1250.
However, this feature changes how scratch space is mapped into the flat
aperture, making address space casts from private to flat no longer
uniform.
2025-08-06 17:08:56 -04:00
Jordan Rupprecht
381623eb11
[bazel] Port #151228: BFloat16 (#152377) 2025-08-06 15:35:03 -05:00
Stanislav Mekhanoshin
d1b6ce50df
[AMDGPU] gfx1250 has fixed GETPC bug and also extended VA to 57 bits (#152373) 2025-08-06 13:32:26 -07:00
cmtice
5a47a1828a
[libcxx] Update testing documentation about CI container images. (#149192)
Add information to the libcxx testing documentation, about the names of
the new CI libcxx runner sets, their current values, and how to change
the values or the runner set being used.
2025-08-06 13:14:47 -07:00
erichkeane
26dde15ed4 [OpenACC] Add warning for VLAs in a private/firstprivate clause
private/firstprivate typically do copy operations, however copying a VLA
isn't really possible.  This patch introduces a warning to alert the
person that this copy isn't happening correctly.

As a future direction, we MIGHT consider doing additional work to make
sure they are initialized/copied/deleted/etc correctly.
2025-08-06 13:14:20 -07:00
Stanislav Mekhanoshin
c2eddec4ff
[AMDGPU] System scope atomics are emulated over PCIe in gfx1250 (#152369)
HW will emulate unsupported PCIe atomics via CAS loop, we do not need to
expand these anymore.
2025-08-06 13:08:12 -07:00