llvm-project

Author	SHA1	Message	Date
Bill Wendling	139bde2035	[llvm] Ignore coding assistant artifacts (#153853 ) Now that "vibe coding" is a thing, ignore the documentation artifacts that coding assistants, like Claude and Gemini, use to retain coding workflows and other metadata.	2025-08-15 12:27:54 -07:00
Alexey Bataev	09f5b9ab0a	Revert "[SLP]Do not include copyable data to the same user twice" This reverts commit 758c6852c3ffe6b5e259cafadd811e60d8c276fb to fix buildbot https://lab.llvm.org/buildbot/#/builders/195/builds/13298	2025-08-15 12:08:31 -07:00
Jasmine Tang	d7a29e5d56	[WebAssembly] Reapply #149461 with correct CondCode in combine of SETCC (#153703 ) This PR reapplies https://github.com/llvm/llvm-project/pull/149461 In the original `combineVectorSizedSetCCEquality`, the result of setcc is being negated by returning setcc with the same cond code, leading to wrong logic. For example, with ```llvm %cmp_16 = call i32 @memcmp(ptr %a, ptr %b, i32 16) %res = icmp eq i32 %cmp_16, 0 ``` the original PR producese all_true and then also compares the result equal to 0 (using the same SETEQ in the returning setcc), meaning that semantically, it effectively is calling icmp ne. Instead, the PR should have use SETNE in the returning setcc, this way, all true return 1, then it is compared again ne 0, which is equivalent to icmp eq.	2025-08-15 12:06:47 -07:00
Abhinav Gaba	79cf877627	[Offload] Introduce dataFence plugin interface. (#153793 ) The purpose of this fence is to ensure that any `dataSubmit`s inserted into a queue before a `dataFence` finish before finish before any `dataSubmit`s inserted after it begin. This is a no-op for most queues, since they are in-order, and by design any operations inserted into them occur in order. But the interface is supposed to be functional for out-of-order queues. The addition of the interface means that any operations that rely on such ordering (like ATTACH map-type support in #149036) can invoke it, without worrying about whether the underlying queue is in-order or out-of-order. Once a plugin supports out-of-order queues, the plugin can implement this function, without requiring any change at the libomptarget level. --------- Co-authored-by: Alex Duran <alejandro.duran@intel.com>	2025-08-15 11:49:35 -07:00
zGoldthorpe	82caa251d4	[InstCombine] Fold integer unpack/repack patterns through ZExt (#153583 ) This patch explicitly enables the InstCombiner to fold integer unpack/repack patterns such as ```llvm define i64 @src_combine(i32 %lower, i32 %upper) { %base = zext i32 %lower to i64 %u.0 = and i32 %upper, u0xff %z.0 = zext i32 %u.0 to i64 %s.0 = shl i64 %z.0, 32 %o.0 = or i64 %base, %s.0 %r.1 = lshr i32 %upper, 8 %u.1 = and i32 %r.1, u0xff %z.1 = zext i32 %u.1 to i64 %s.1 = shl i64 %z.1, 40 %o.1 = or i64 %o.0, %s.1 %r.2 = lshr i32 %upper, 16 %u.2 = and i32 %r.2, u0xff %z.2 = zext i32 %u.2 to i64 %s.2 = shl i64 %z.2, 48 %o.2 = or i64 %o.1, %s.2 %r.3 = lshr i32 %upper, 24 %u.3 = and i32 %r.3, u0xff %z.3 = zext i32 %u.3 to i64 %s.3 = shl i64 %z.3, 56 %o.3 = or i64 %o.2, %s.3 ret i64 %o.3 } ; => define i64 @tgt_combine(i32 %lower, i32 %upper) { %base = zext i32 %lower to i64 %upper.zext = zext i32 %upper to i64 %s.0 = shl nuw i64 %upper.zext, 32 %o.3 = or disjoint i64 %s.0, %base ret i64 %o.3 } ``` Alive2 proofs: [YAy7ny](https://alive2.llvm.org/ce/z/YAy7ny)	2025-08-15 12:48:32 -06:00
Alexey Bataev	758c6852c3	[SLP]Do not include copyable data to the same user twice If the copyable schedule data is created and the user is used several times in the user node, no need to count same data for the same user several times, need to include it only ones. Fixes #153754	2025-08-15 11:47:35 -07:00
Erich Keane	dcdbd5b55d	[OpenACC][NFCI] Implement 'recipe' generation for firstprivate copy (#153622 ) The 'firstprivate' clause requires that we do a 'copy' operation, so this patch creates some AST nodes from which we can generate the copy operation, including a 'temporary' and array init. For the most part this is pretty similar to what 'private' does other than the fact that the source is copy (and not default init!), and that there is a temporary from which to copy. --------- Co-authored-by: Andy Kaylor <akaylor@nvidia.com>	2025-08-15 18:42:40 +00:00
Stanislav Mekhanoshin	29976f2e58	[AMDGPU] Handle S_GETREG_B32 hazard on gfx1250 (#153848 ) GFX1250 SPG says: S_GETREG_B32 does not wait for idle before executing. The user must S_WAIT_ALU 0 before S_GETREG_B32 on: STATUS, STATE_PRIV, EXCP_FLAG_PRIV, or EXCP_FLAG_USER.	2025-08-15 11:38:22 -07:00
XChy	3a4a60deff	[VectorCombine] Apply InstSimplify in scalarizeOpOrCmp to avoid infinite loop (#153069 ) Fixes #153012 As we tolerate unfoldable constant expressions in `scalarizeOpOrCmp`, we may fold ```llvm define void @bug(ptr %ptr1, ptr %ptr2, i64 %idx) #0 { entry: %158 = insertelement <2 x i64> <i64 5, i64 ptrtoint (ptr @val to i64)>, i64 %idx, i32 0 %159 = or disjoint <2 x i64> splat (i64 2), %158 store <2 x i64> %159, ptr %ptr2 ret void } ``` to ```llvm define void @bug(ptr %ptr1, ptr %ptr2, i64 %idx) { entry: %.scalar = or disjoint i64 2, %idx %0 = or <2 x i64> splat (i64 2), <i64 5, i64 ptrtoint (ptr @val to i64)> %1 = insertelement <2 x i64> %0, i64 %.scalar, i64 0 store <2 x i64> %1, ptr %ptr2, align 16 ret void } ``` And it would be folded back in `foldInsExtBinop`, resulting in an infinite loop. This patch forces scalarization iff InstSimplify can fold the constant expression.	2025-08-15 18:38:04 +00:00
Dave Lee	1dc0005d6d	Revert "[lldb] Fallback to expression eval when Dump of variable fails in dwim-print" (#153824 ) Reverts llvm/llvm-project#151374 Superseded by https://github.com/llvm/llvm-project/pull/152417	2025-08-15 11:29:31 -07:00
Stanislav Mekhanoshin	5d28284dbb	[AMDGPU] gfx1250 does not need nop before VGPR dealloc (#153844 ) This has no impact as the dealloc is now practically disabled.	2025-08-15 11:29:02 -07:00
Valentin Clement (バレンタインクレメン)	3720d8b52d	[flang][cuda] Update some bind name to fast version and add __sincosf (#153744 ) Use the fast version in the bind name and reorder these fast math functions. Add missing __sincosf interface.	2025-08-15 11:07:15 -07:00
Aaron Ballman	ed6d505fab	[C][Docs] Add backported language features (#153837 ) We've backported a lot more features from C to previous C standards than we were documenting. I took a pass over the c_status page for Clang and pulled more entries to add to our documentation.	2025-08-15 13:59:41 -04:00
Kaitlin Peng	0bb1af478a	[DirectX] Add GlobalDCE pass after finalize linkage pass in DirectX backend (#151071 ) Fixes #139023. This PR essentially removes unused global variables: - Restores the `GlobalDCE` Legacy pass and adds it to the DirectX backend after the finalize linkage pass - Converts external global variables with no usage to internal linkage in the finalize linkage pass - (so they can be removed by `GlobalDCE`) - Makes the `dxil-finalize-linkage` pass usable using the new pass manager flag syntax - Adds tests to `finalize_linkage.ll` that make sure unused global variables are removed - Adds a use for variable `@CBV` in `opaque-value_as_metadata.ll` so it isn't removed - Changes the `scalar-data.ll` run command to avoid removing its global variables --------- Co-authored-by: Farzon Lotfi <farzonlotfi@microsoft.com>	2025-08-15 10:45:34 -07:00
Aiden Grossman	069f8121e0	[X86] Add RCU for Skylake Models (#153832 ) We cannot actually retire an infinite number of uops per cycle. This patch adds a RCU to the skylake scheduling model to fix this. I'm purposefully using a loose upper bound here. We're unlikely to actually get four fused uops per cycle, but this is better than not setting anything. Most realistic code I've put through uiCA will retire up to ~6 uops per cycle. Information taken from https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(client). This requires modification of the two zero idiom tests because we do not currently model the CPU frontend which would likely be the actual bottleneck in that case. Related to #153747.	2025-08-15 10:33:26 -07:00
Valentin Clement (バレンタインクレメン)	115f816069	[flang][cuda] Add missing bind name for __int2double_rn (#153720 )	2025-08-15 10:27:19 -07:00
Valentin Clement (バレンタインクレメン)	0e4af726cb	[flang][cuda] Add interface for __fdividef (#153742 )	2025-08-15 10:26:40 -07:00
Valentin Clement (バレンタインクレメン)	0e8c964c21	[flang][cuda] Add interfaces for double_as_longlong and longlong_as_double (#153719 )	2025-08-15 17:26:11 +00:00
Alex MacLean	bc77363235	[NVPTX] Do not mark move of global address as cheap enabling more CSE (#153730 )	2025-08-15 10:17:34 -07:00
Valentin Clement (バレンタインクレメン)	fd3f052aeb	[flang][cuda] Add interfaces for int_as_float and float_as_int (#153716 )	2025-08-15 10:00:53 -07:00
Simon Pilgrim	92cb0414ca	[X86] avx512vnni-builtins.c / avx512vlvnni-builtins.c - add C/C++ and 32/64-bit test coverage	2025-08-15 17:55:33 +01:00
asraa	b045729eb4	[mlir][presburger] add functionality to compute local mod in IntegerRelation (#153614 ) Similar to `IntegerRelation::addLocalFloorDiv`, this adds a utility `IntegerRelation::addLocalModulo` that adds and returns a local variable that is the modulus of an affine function of the variables modulo some constant modulus. The function returns the absolute index of the new var in the relation. This is computed by first finding the floordiv of `exprs // modulus = q` and then computing the remainder `result = exprs - q * modulus`. Signed-off-by: Asra Ali <asraa@google.com>	2025-08-15 09:55:13 -07:00
zGoldthorpe	a8d25683ee	[PatternMatch] Allow `m_ConstantInt` to match integer splats (#153692 ) When matching integers, `m_ConstantInt` is a convenient alternative to `m_APInt` for matching unsigned 64-bit integers, allowing one to simplify ```cpp const APInt *IntC; if (match(V, m_APInt(IntC))) { if (IntC->ule(UINT64_MAX)) { uint64_t Int = IntC->getZExtValue(); // ... } } ``` to ```cpp uint64_t Int; if (match(V, m_ConstantInt(Int))) { // ... } ``` However, this simplification is only true if `V` is a scalar type. Specifically, `m_APInt` also matches integer splats, but `m_ConstantInt` does not. This patch ensures that the matching behaviour of `m_ConstantInt` parallels that of `m_APInt`, and also incorporates it in some obvious places.	2025-08-15 10:43:54 -06:00
keinflue	af96ed6bf6	[clang] Inject IndirectFieldDecl even if name conflicts. (#153140 ) This modifies InjectAnonymousStructOrUnionMembers to inject an IndirectFieldDecl and mark it invalid even if its name conflicts with another name in the scope. This resolves a crash on a further diagnostic diag::err_multiple_mem_union_initialization which via findDefaultInitializer relies on these declarations being present. Fixes #149985	2025-08-15 09:43:29 -07:00
Simon Pilgrim	2c20a9bfb3	[X86] avx512bf16-builtins.c / avx512vlbf16-builtins.c - add C/C++ and 32/64-bit test coverage	2025-08-15 17:38:43 +01:00
CatherineMoore	3a8f579a23	[OpenMP] Update printf statement with missing argument. (#153704 )	2025-08-15 16:34:00 +00:00
Valentin Clement (バレンタインクレメン)	583499a8cf	[flang][cuda] Add missing bind name for __hiloint2double, __double2loint and __double2hiint (#153713 )	2025-08-15 09:32:59 -07:00
Andrey Timonin	dfa1335db1	[mlir][emitc] Add verification for the emitc.get_field op (#152577 ) This MR adds a `verifier` for the `emitc.get_field` op. - The `verifier` checks that the `emitc.get_field` operation is nested inside an `emitc.class` op. - Additionally, appropriate tests for erroneous cases were added for class-related operations in `invalid_ops.mlir`.	2025-08-15 18:32:12 +02:00
Min-Yih Hsu	0e9b6d6c8a	[IA][RISCV] Detecting gap mask from a mask assembled by interleaveN intrinsics (#153510 ) If the mask of a (fixed-vector) deinterleaved load is assembled by `vector.interleaveN` intrinsic, any intrinsic arguments that are all-zeros are regarded as gaps.	2025-08-15 09:22:47 -07:00
Phoebe Wang	b0d2b57f7e	[Headers][X86] Remove more duplicated typedefs (#153820 ) They are defined in mmintrin.h	2025-08-16 00:21:40 +08:00
Shubham Sandeep Rastogi	cd0bf2735b	Revert "[LLDB] Update DIL handling of array subscripting. (#151605 )" This reverts commit 6d3ad9d9fd830eef0ac8a9d558e826b8b624e17d. This was reverted because it broke the LLDB greendragon bot.	2025-08-15 09:17:33 -07:00
Craig Topper	853094fd81	[VirtRegMap] Use TRI member variable. NFC	2025-08-15 09:14:09 -07:00
George Burgess IV	c10766cf49	[utils] add `stop_at_sha` to revert_checker's API (#152011 ) This is useful for downstream consumers of this as a module. It's unclear if interactive use wants this lever, but support can easily be added if so.	2025-08-15 16:13:29 +00:00
Nikita Popov	01bc742185	[CodeGen] Give ArgListEntry a proper constructor (NFC) (#153817 ) This ensures that the required fields are set, and also makes the construction more convenient.	2025-08-15 18:06:07 +02:00
Daniel Paoliello	1d1e52e614	[win][x64] Allow push/pop for stack alloc when unwind v2 is required (#153621 ) While attempting to enable Windows x64 unwind v2, compilation failed with the following error: ``` fatal error: error in backend: Windows x64 Unwind v2 is required, but LLVM has generated incompatible code in function '<redacted>': Cannot pop registers before the stack allocation has been deallocated ``` I traced this down to an optimization in `X86FrameLowering`: <`6961139ce9/llvm/lib/Target/X86/X86FrameLowering.cpp (L324-L340)`> Technically, using `push`/`pop` to adjust the stack is permitted under unwind v2: the requirement for a "canonical" epilog is that the stack is fully adjusted before the registers listed as pushed in the unwind table are popped. So, as long as the `.seh_unwindv2start` pseudo is after the pops that adjust the stack, then everything will work correctly. One other side effect of this change is that the stack is now allowed to be adjusted across multiple instructions, which would be needed for extremely large stack frames.	2025-08-15 09:03:44 -07:00
Leandro Lacerda	08ff017fb0	[libc] Improve GPU benchmarking (#153512 ) This patch improves the GPU benchmarking in this way: * Replace `rand`/`srand` with a deterministic per-thread RNG seeded by `call_index`: reproducible, apples-to-apples libc vs vendor comparisons. * Fix input generation: sample the unbiased exponent uniformly in `[min_exp, max_exp]`, clamp bounds, and skip `Inf`, `NaN`, `-0.0`, and `+0.0`. * Fix standard deviation: use an explicit estimator from sums and sums-of-squares (`sqrt(E[x^2] − E[x]^2)`) across samples. * Fix throughput overhead: subtract a loop-only baseline inside NVPTX/AMDGPU timing backends so `benchmark()` gets cycles-per-call already corrected (no `overhead()` call). * Adapt existing math benchmarks to the new RNG/timing plumbing (plumb `call_index`, drop `rand/srand`, clean includes). * Correct inter-thread aggregation: use iteration-weighted pooling to compute the global mean/variance, ensuring statistically sound `Cycles (Mean)` and `Stddev`. * Remove `Time / Iteration` column from the results table: it reported per-thread convergence time (not per-call latency) and was redundant/misleading next to `Cycles (Mean)`. * Remove unused `BenchmarkLogger` files: dead code that added maintenance and cognitive overhead without providing functionality. --- ## TODO (before merge) * [ ] Investigate compiler warnings and address their root causes. * [x] Review how per-thread results are aggregated into the overall result. ## Follow-ups (future PRs) * Add support to run throughput benchmarks with uniform (linear) input distributions, alongside the current log2-uniform scheme. * Review/adjust the configuration and coverage of existing math benchmarks. * Add more math benchmarks (e.g., `exp`/`expf`, others).	2025-08-15 11:00:17 -05:00
Ramkumar Ramachandra	f34326dac8	[VPlan] Introduce vputils::onlyScalarValuesUsed (NFC) (#153577 )	2025-08-15 15:55:59 +00:00
Shafik Yaghmour	868efdcf38	[Clang][Bytecode][NFC] Move Result into APSInt constructor (#153664 ) Static analysis flagged this line because we are copying Result instead of moving it.	2025-08-15 08:52:49 -07:00
Dave Lee	ae7e1b82fe	[lldb] Print ValueObject when GetObjectDescription fails (#152417 ) This fixes a few bugs, effectively through a fallback to `p` when `po` fails. The motivating bug this fixes is when an error within the compiler causes `po` to fail. Previously when that happened, only its value (typically an object's address) was printed – and problematically, no compiler diagnostics were shown. With this change, compiler diagnostics are shown, _and_ the object is fully printed (ie `p`). Another bug this fixes is when `po` is used on a type that doesn't provide an object description (such as a struct). Again, the normal `ValueObject` printing is used. Additionally, this also improves how lldb handles an object description method that fails in some way. Now an error will be shown (it wasn't before), and the value will be printed normally.	2025-08-15 08:37:26 -07:00
Tim Gymnich	ffaba758fb	[MLIR][ROCDL] Add permlane16.swap and permanlane32.swap (#153804 ) add rocdl.permlane16.swap and rocdl.permanlane32.swap	2025-08-15 17:35:31 +02:00
Simon Pilgrim	38eb14f27c	[X86] avx512vbmi2-builtins.c / avx512vlvbmi2-builtins.c - add C/C++ and 32/64-bit test coverage	2025-08-15 16:35:16 +01:00
Simon Pilgrim	7df862818e	[X86] avx512vbmi-builtins.c / avx512vbmivl-builtin.c - add C/C++ and 32/64-bit test coverage	2025-08-15 16:35:15 +01:00
Tim Renouf	f279c47cb3	AMDGPU gfx12: Add _dvgpr$ symbols for dynamic VGPRs (#148251 ) For each function with the AMDGPU_CS_Chain calling convention, with dynamic VGPRs enabled, add a _dvgpr$ symbol, with the value of the function symbol, plus an offset encoding one less than the number of VGPR blocks used by the function (16 VGPRs per block, no more than 128) in bits 5..3 of the symbol value. This is used by a front-end to have functions that are chained rather than called, and a dispatcher that dynamically resizes the VGPR count before dispatching to a function.	2025-08-15 16:33:06 +01:00
Aiden Grossman	0b04168948	[CI] Add Basic Bazel Checks (#153740 ) Having basic checks (like running buildifier) on the upstream bazel files would be helpful for contributors maintaining the bazel build. Add basic checks (currently just buildifier) to a workflow that runs whenever the bazel build files change.	2025-08-15 08:30:07 -07:00
cmtice	6d3ad9d9fd	[LLDB] Update DIL handling of array subscripting. (#151605 ) This updates the DIL code for handling array subscripting to more closely match and handle all the cases from the original 'frame var' implementation. Also updates the DIL array subscripting test. This particularly fixes some issues with handling synthetic children, objc pointers, and accessing specific bits within scalar data types.	2025-08-15 08:26:45 -07:00
Nikita Popov	11c2240049	[SDAGBuilder] Rename RetTys -> RetVTs (NFC) Make it clearer that this is a vector of EVTs, not IR types. Based on: https://github.com/llvm/llvm-project/pull/153798#discussion_r2279066696	2025-08-15 17:06:33 +02:00
Philip Reames	606937474e	[SDAG] Remove IndexType manipulation in getUniformBase and callers (#151578 ) All paths set it to the same value, just propagate that value to the consumer.	2025-08-15 08:00:47 -07:00
Florian Hahn	2b1e06598f	[LV] Regenerate some more check lines. (NFC)	2025-08-15 15:53:19 +01:00
Alexey Bataev	13b54f7dc1	[SLP] Recalculate dependencies for potential control dependencies if cleared If the control dependecies are cleared after calcellation of the copyables, need to reclculate them unconditionally. Fixes #153754 #153676	2025-08-15 07:52:10 -07:00
Phoebe Wang	f24d91eb2c	[Headers][X86] Remove duplicate __v8hu, NFCI (#153734 ) Newly added in xmmintrin.h by c8312bdd1665225c585dd2b0bff5e46d569edd45	2025-08-15 22:48:59 +08:00

1 2 3 4 5 ...

548758 Commits