llvm-project

Author	SHA1	Message	Date
Simon Pilgrim	b83f7f195c	[Headers][X86] Update SSE/AVX and/andnot/or/xor intrinsics to be used in constexpr (#152305 )	2025-08-07 08:16:26 +01:00
Matt Arsenault	f44d8d583c	AMDGPU: Add a few missing mfma rewrite tests (#152434 ) Test other splitting situations that appear in greedy. This includes ensuring we have a case that hits a local split and instruction split (most of the tests hit the region split path). Also test a few cases where the final result isn't fully used, resulting in partial copy bundles instead of a simple full copy. Test physreg and virtreg agpr interference with a reassignment candidate. I'm accumulating too many failure cases, and MIR tests are very prone to painful merge conflicts, so I've added a few more tests and extracted new tests from #147975. Closes #149026	2025-08-07 16:14:45 +09:00
Matthias Springer	a3e0685529	[mlir][Transforms] More detailed error message when new IR cannot be legalized (#152297 ) Print a more detailed error message when new/modified IR could not be legalized with `allowPatternRollback = false`. This is useful to understand why a pattern is incompatible with the new One-Shot Dialect Conversion driver. --------- Co-authored-by: Jeremy Kun <jkun@google.com>	2025-08-07 09:14:24 +02:00
Matt Arsenault	1110e2ff9f	InlineFunction: Split inlining into predicate and apply functions (#134213 ) This is to support a new inline function reduction in llvm-reduce, which should pre-filter callsites that are not eligible for inlining. This code was mostly structured as a match and apply, with a few exceptions. The ugliest piece is for propagating and verifying compatible getGC and personalities. Also collection of EHPad and the convergence token to use are now cached in InlineFunctionInfo. I was initially confused by the split between the checks performed here and isInlineViable, so better document how this system is supposed to work. It turns out this split does make sense, in that isInlineViable checks if it's possible based on the callee content and the ultimate inline depended on the callsite context. I think more renames of these functions would help, and isInlineViable should probably move out of InlineCost to be with these transfoms.	2025-08-07 16:13:36 +09:00
Nikita Popov	406d9b1dd6	[CodeGen] Move IsFixed into ArgFlags (NFCI) (#152319 ) The information whether a specific argument is vararg or fixed is currently stored separately from all the other argument information in ArgFlags. This means that it is not accessible from CCAssign, and backends have developed all kinds of workarounds for how they can access it after all. Move this information to ArgFlags to make it directly available in all relevant places. I've opted to invert this and store it as IsVarArg, as I think that both makes the meaning more obvious and provides for a better default (which is IsVarArg=false).	2025-08-07 09:12:40 +02:00
Simon Pilgrim	edad89e4e0	[Headers][X86] Update MMX arithmetic intrinsics to be used in constexpr (#152296 ) Update the easy add/sub/mul/logic/cmp/scalar_to_vector intrinsics to be constexpr compatible. I'm not expecting anyone to be very interested in using MMX intrinsics, but they're smaller than the other types and are useful to test the constexpr handling and test methods before we start applying them to SSE/AVX2/AVX512 intrinsics.	2025-08-07 08:05:05 +01:00
Florian Hahn	a485e0eae0	[VPlan] Retrieve vector TC for epilogue from resume phi (NFC). Instead of relying on getOrCreateVectorTripCount to initialize EPI.VectorTripCount, delay initialization after we retrieved the resume phi and get the trip count from there. This makes the code independent of legacy vector trip count creation.	2025-08-07 07:52:35 +01:00
Matthias Springer	71832a3139	[mlir][Transforms] Make lookup without type converter unambiguous (#151747 ) When a conversion pattern is initialized without a type converter, the driver implementation currently looks up the most recently mapped value. This is undesirable because the most recently mapped value could be a materialization. I.e., the type of the value being looked up could depend on which other patterns have run before. Such an implementation makes the type conversion infrastructure fragile and unpredictable. The current implementation also contradicts the documentation in the markdown file. According to that documentation, the values provided by the adaptor should match the types of the operands of the match operation when running without a type converter. This mechanism is not desirable, either, for two reasons: 1. Some patterns have started to rely on receiving the most recently mapped value. Changing the behavior to the documented behavior will cause regressions. (And there would be no easy way to fix those without forcing the use of a type converter or extending the `getRemappedValue` API.) 2. It is more useful to receive the most recently mapped value. A value of the original operand type can be retrieved by using the operand of the matched operation. The adaptor is not needed at all in that case. To implement the new behavior, materializations are now annotated with a marker attribute. The marker is needed because not all `unrealized_conversion_cast` ops are materializations that act as "pure type conversions". E.g., when erasing an operation, its results are mapped to newly-created "out-of-thin-air values", which are materializations (with no input) that should be treated like regular replacement values during a lookup. This marker-based lookup strategy is also compatible with the One-Shot Dialect Conversion implementation strategy, which does not utilize the mapping infrastructure anymore and queries all necessary information by examining the IR.	2025-08-07 08:41:28 +02:00
Matthias Springer	0a72e6ddac	[mlir][Transforms] `ConversionPatternRewriter`: Add `config` getter (#152310 ) Add a helper function to `ConversionPatternRewriter` that returns the dialect conversion configuration. This flag is useful when migrating conversion patterns to the new One-Shot Conversion Driver: patterns can check if they are running in rollback mode or not. They can then work around API changes and makes sure that the pattern keeps working with both the old and new driver. Also remove the `config` field from `OperationLegalizer`. That field was never needed.	2025-08-07 08:33:24 +02:00
Ziqing Luo	0abf4975bb	[-Wunsafe-buffer-usage] Do not warn about class methods with libc function names (#151270 ) This commit fixes the false positive that C++ class methods with libc function names would be false warned about. For example, ``` struct T {void strcpy() const;}; void test(const T& t) { str.strcpy(); // no warn } ``` rdar://156264388	2025-08-07 14:31:13 +08:00
Madhur Amilkanthwar	13daf3b70c	[GVN-PRE][Tests] Add MSSA coverage to some more tests [4/N] (#151919 ) This should be the final PR for tests under PRE.	2025-08-07 11:16:07 +05:30
Valentin Clement (バレンタインクレメン)	35f003d13b	[flang][cuda] Fix buildbot after #152418 (#152437 )	2025-08-06 22:24:35 -07:00
Valentin Clement (バレンタインクレメン)	eb0ddba26b	Reland "[flang][cuda] Set the allocator of derived type component after allocation" (#152418 ) Reviewed in #152379 - Move the allocator index set up after the allocate statement otherwise the derived type descriptor is not allocated. - Support array of derived-type with device component	2025-08-06 21:49:55 -07:00
Princeton Ferro	9a592d9a84	[NVPTX] lower VECREDUCE min/max to 3-input on sm_100+ (#136253 ) Add support for 3-input fmaxnum/fminnum/fmaximum/fminimum introduced in PTX 8.8 for sm_100+: - Use a tree reduction when 3-input operations are supported and the reduction has the `reassoc` flag. - If not on sm_100+/PTX 8.8, fallback to 2-input operations and use the default shuffle reduction.	2025-08-06 21:45:21 -07:00
Luke Lau	a04142f11f	[LV][RISCV] Add check lines for scalable interleave costs. NFC Previously we could only scalably vectorize interleave groups with factor 2, but after 7ef77eb9984d1fb537a409cf4be89560fbb681fe we now support all factors (available on RISC-V). So this adds the remaining check lines for the scalable VFs.	2025-08-07 12:28:12 +08:00
Sharjeel Khan	d9f9064cfa	[ubsan_minimal] Add address argument to Android's abort message function (#152419 ) https://github.com/llvm/llvm-project/pull/152192 forgot to make the argument changes to Android code in UBsan minimal causing a build error for Android LLVM: ``` /b/f/w/src/git/out/llvm-project/compiler-rt/lib/ubsan_minimal/ubsan_minimal_handlers.cpp:102:3: error: no matching function for call to 'format_msg' 102 \| format_msg(kind, caller, msg_buf, msg_buf + sizeof(msg_buf)); \| ^~~~~~~~~~ /b/f/w/src/git/out/llvm-project/compiler-rt/lib/ubsan_minimal/ubsan_minimal_handlers.cpp:37:13: note: candidate function not viable: requires 5 arguments, but 4 were provided 37 \| static void format_msg(const char kind, uintptr_t caller, \| ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 38 \| const uintptr_t address, char buf, const char end) { ``` This change adds the address argument to abort_with_message just like __ubsan_report_error_fatal so it can be passed to format_msg.	2025-08-06 20:36:21 -07:00
Luke Lau	44af26ea2e	[LV] Fix EVL test after merge. NFC Test was modified in both 25d1285eecbab731eaf418c8aab44e4eb5f9e538 and df8da2ff8370fda479b5c118704af4f50e0d3536	2025-08-07 11:12:43 +08:00
Vitaly Buka	0168324523	[CI] Test compiler-rt when it's changed (#152425 )	2025-08-06 20:02:48 -07:00
Luke Lau	df8da2ff83	[VPlan] Support VPWidenPointerInductionRecipes with EVL tail folding (#152110 ) Now that VPWidenPointerInductionRecipes are modelled in VPlan in #148274, we can support them in EVL tail folding. We need to replace their VFxUF operand with EVL as the increment is not guaranteed to always be VF on the penultimate iteration, and UF is always 1 with EVL tail folding. We also need to move the creation of the backedge value to the latch so that EVL dominates it. With this we will no longer fail to convert a VPlan to EVL tail folding, so adjust tryAddExplicitVectorLength to account for this. This brings us to 99.4% of all vector loops vectorized on SPEC CPU 2017 with tail folding vs no tail folding. The test in only-compute-cost-for-vplan-vfs.ll previously relied on widened pointer inductions with EVL tail folding to end up in a scenario with no vector VPlans, so this also replaces it with an unvectorizable fixed-order recurrence test from first-order-recurrence-multiply-recurrences.ll that also gets discarded.	2025-08-07 10:54:24 +08:00
Valentin Clement (バレンタインクレメン)	a196281896	[flang][cuda] Remove meaningless warning on CUDA shared arguments (#152404 ) The warning in issued during the compatibility check makes little sense. Just remove it as it is confusing.	2025-08-06 18:50:07 -07:00
Valentin Clement (バレンタインクレメン)	2696e8c149	[flang][cuda] Remove too restrictive assert for data transfer (#152398 ) When the rhs is a an array element, the assert was triggered but this is still a valid transfer. Remove the assert. The operation has a verifier to check its validity.	2025-08-06 18:49:52 -07:00
Jorge Gorbe Moya	8381f95dec	[bazel] Fix mlir/tests after 281e6d2cc498d05f3ca601e3b1d595420e7ed827 (#152413 )	2025-08-06 20:36:55 -05:00
Farzon Lotfi	04672e20d4	[DirectX] ForwardHandle needs to check if globals were stored on allocas (#151751 ) fixes #140819 SROA pass is making it so that some globals get loaded into stack allocations. This means we find an alloca where we use to expect a load and now need to walk an alloca -> store -> maybe load chain before we find the global. Doing so fixes All but two instances of #137715 And fixes every instance of `Load of "8.sroa.0" is not a global resource handle we are currently seeing in the DML shaders.	2025-08-06 21:14:35 -04:00
Thurston Dang	01472d8e35	[NFC][asan] Update shadow mapping comments for AArch64 non-Android Linux (#152412 ) This adds commentary to explain why ASan does not work for AArch64 non-Android Linux with 39-bit and 42-bit VMAs (e.g., https://github.com/llvm/llvm-project/issues/145259). Additionally, it updates the 42-bit VMA shadow map comment, which has been outdated for the last 10 years (18b2258c92df93c83bc7fce94c20baff3c06e2c6 changed 39-bit and 42-bit to use the same offset), and adds a comment for the 48-bit VMA shadow map.	2025-08-06 18:06:05 -07:00
Craig Topper	886b2133e3	[RISCV] Relax one of the zexti8 in the PACKH+PACK(W)/SLLI patterns. (#152384 ) For RV32 we don't need the byte shifted by 24 to be zero extend since the extended bits are shifted out. For RV64, we don't need the byte shifted by 24 to be zero extended if the upper 32 bits of the result aren't demanded.	2025-08-06 17:46:43 -07:00
Wenju He	3d1c1a5277	[libclc] Set TARGET_FILE property for prepare-${obj_suffix} target (#152245 ) The target's output bitcode `libclc_builtins_lib` is located in a sub-directory in clang resource directory since df7473673214. Setting TARGET_FILE property can allow targets in non-libclc project to obtain the path to `libclc_builtins_lib`.	2025-08-07 08:28:43 +08:00
Daniel Paoliello	7694856fdd	Fix TargetParserTests for big-endian hosts (#152407 ) The new `sys::detail::getHostCPUNameForARM` for Windows (#151596) was implemented using a C++ bit-field, which caused the associated unit tests to fail on big-endian machines as it assumed a little-endian layout. This change switches from the C++ bit-field to LLVM's `BitField` type instead.	2025-08-06 16:50:28 -07:00
Finn Plummer	acb5d0c211	[NFC][HLSL] Replace uses of `getResourceName`/`printEnum` (#152211 ) Introduce the `enumToStringRef` enum into `ScopedPrinter.h` that replicates `enumToString` behaviour, expect that instead of returning a hex value string, it just returns an empty string. This allows us to return a StringRef and easily check if an invalid enum was provided based on the StringRef size This then uses `enumToStringRef` to remove the redundant `getResourceName` and `printEnum` functions. Resolves: https://github.com/llvm/llvm-project/issues/151200.	2025-08-06 16:35:16 -07:00
Uzair Nawaz	c4846d29cd	[libc] Move CharacterConverter template specialization to cpp file (#152405 ) Fixes build errors caused by #152204	2025-08-06 23:32:23 +00:00
Florian Mayer	a7f1702f2c	[NFC] [CFI] correct comment in test (#152399 ) It incorrectly stated that `const char` gets normalized to ptr, while it should say that `char` does.	2025-08-06 16:07:40 -07:00
Valentin Clement (バレンタインクレメン)	7d3134f6cc	Revert "[flang][cuda] Set the allocator of derived type component after allocation" (#152402 ) Reverts llvm/llvm-project#152379 Buildbot failure https://lab.llvm.org/buildbot/#/builders/207/builds/4905	2025-08-06 15:55:53 -07:00
Uzair Nawaz	e83abd774a	[libc] Template StringConverter pop function to avoid duplicate code (#152204 ) Addressed TODO to template the StringConverter pop functions to have a single implementation (combine popUTF8 and popUTF32 into a single templated pop function)	2025-08-06 15:46:41 -07:00
Qiongsi Wu	09dbdf6514	[clang][Dependency Scanning] Move Module Timestamp Update After Compilation Finishes (#151774 ) When two threads are accessing the same `pcm`, it is possible that the reading thread sees the timestamp update, while the file on disk is not updated. This PR moves timestamp update from `writeAST` to `compileModuleAndReadASTImpl`, so we only update the timestamp after the file has been committed to disk. rdar://152097193	2025-08-06 15:39:37 -07:00
Stanislav Mekhanoshin	b296ea9c14	[AMDGPU] s_get_shader_cycles_u64 gfx1250 instruction (#152390 ) It is the same as reading SHADER_CYCLES_LO and SHADER_CYCLES_HI but with a single instruction.	2025-08-06 15:32:28 -07:00
Andrew Lazarev	f61526971f	Revert "[WebAssembly] Constant fold wasm.dot" (#152382 ) Reverts llvm/llvm-project#149619 It breaks ubsan bot: https://lab.llvm.org/buildbot/#/builders/25/builds/10523 Earlier today the failure was hidden by another breakage that is fixed now.	2025-08-06 15:16:19 -07:00
Valentin Clement (バレンタインクレメン)	d897355876	[flang][cuda] Set the allocator of derived type component after allocation (#152379 ) - Move the allocator index set up after the allocate statement otherwise the derived type descriptor is not allocated. - Support array of derived-type with device component	2025-08-06 15:14:00 -07:00
lntue	885ddf4a3a	[libc] Fix constexpr FPUtils rounding_mode.h functions. (#152342 )	2025-08-06 22:05:12 +00:00
Aiden Grossman	d54aa36146	[CI] Refactor monolithic-* scripts to use common utils.sh This patch refactors big chunks of the common functionality shared between monolithic-linux.sh and monolithic-windows.sh to a separate script, utils.sh, that is then sourced in both of the files. This makes it a bit easier to maintain the scripts. Platform differences should not be a large deal for the setup here as we are using bash as the shell on both Linux and Windows. Reviewers: lnihlen, gburgessiv, Keenuts, DavidSpickett, dschuff, cmtice, Endilll Reviewed By: DavidSpickett, cmtice Pull Request: https://github.com/llvm/llvm-project/pull/152199	2025-08-06 15:00:51 -07:00
Md Abdullah Shahneous Bari	281e6d2cc4	[mlir][ExecutionEngine] Add LevelZeroRuntimeWrapper. (#151038 ) Adds LevelZeroRuntime wrapper and tests. Co-authored-by: Artem Kroviakov <artem.kroviakov@intel.com> Co-authored-by: Nishant Patel <nishant.b.patel@intel.com> --------- Co-authored-by: Artem Kroviakov <artem.kroviakov@intel.com> Co-authored-by: Nishant Patel <nishant.b.patel@intel.com>	2025-08-06 16:48:59 -05:00
Stanislav Mekhanoshin	66392a8d8d	[AMDGPU] Add XNACK_STATE_PRIV and _MASK gfx1250 registers (#152374 ) Co-authored-by: Pierre Vanhoutryve <pierre.vanhoutryve@amd.com> Co-authored-by: Pierre Vanhoutryve <pierre.vanhoutryve@amd.com>	2025-08-06 14:44:17 -07:00
hidekisaito	83e5a99ff6	[AMDGPU][Offload] Enable memory manager use for up to ~3GB allocation size in omp_target_alloc (#151882 ) Enables AMD data center class GPUs to use memory manager memory pooling up to 3GB allocation by default, up from the "1 << 13" threshold that all plugin-nextgen devices use.	2025-08-06 14:41:20 -07:00
Stanislav Mekhanoshin	c3103068b7	[AMDGPU] Add more gfx1250 MC tests. NFC. (#152388 ) These are already working, but left downstream.	2025-08-06 14:38:28 -07:00
Jonas Devlieghere	87404eaf04	[lldb] Fix undefined behavior in DWARFExpressionTest RegisterInfo is a trivial class and doesn't default initialize its members. Thanks Alex for getting to the bottom of this.	2025-08-06 14:32:41 -07:00
Stanislav Mekhanoshin	184821b63d	[AMDGPU] Add gfx1250 DS MC tests. NFC. (#152378 )	2025-08-06 14:15:35 -07:00
Shilei Tian	351b38f266	[AMDGPU] Mark address space cast from private to flat as divergent if target supports globally addressable scratch (#152376 ) Globally addressable scratch is a new feature introduced in gfx1250. However, this feature changes how scratch space is mapped into the flat aperture, making address space casts from private to flat no longer uniform.	2025-08-06 17:08:56 -04:00
Jordan Rupprecht	381623eb11	[bazel] Port #151228 : BFloat16 (#152377 )	2025-08-06 15:35:03 -05:00
Stanislav Mekhanoshin	d1b6ce50df	[AMDGPU] gfx1250 has fixed GETPC bug and also extended VA to 57 bits (#152373 )	2025-08-06 13:32:26 -07:00
cmtice	5a47a1828a	[libcxx] Update testing documentation about CI container images. (#149192 ) Add information to the libcxx testing documentation, about the names of the new CI libcxx runner sets, their current values, and how to change the values or the runner set being used.	2025-08-06 13:14:47 -07:00
erichkeane	26dde15ed4	[OpenACC] Add warning for VLAs in a private/firstprivate clause private/firstprivate typically do copy operations, however copying a VLA isn't really possible. This patch introduces a warning to alert the person that this copy isn't happening correctly. As a future direction, we MIGHT consider doing additional work to make sure they are initialized/copied/deleted/etc correctly.	2025-08-06 13:14:20 -07:00
Stanislav Mekhanoshin	c2eddec4ff	[AMDGPU] System scope atomics are emulated over PCIe in gfx1250 (#152369 ) HW will emulate unsupported PCIe atomics via CAS loop, we do not need to expand these anymore.	2025-08-06 13:08:12 -07:00

... 3 4 5 6 7 ...

547939 Commits