548758 Commits

Author SHA1 Message Date
Bill Wendling
139bde2035
[llvm] Ignore coding assistant artifacts (#153853)
Now that "vibe coding" is a thing, ignore the documentation artifacts
that coding assistants, like Claude and Gemini, use to retain coding
workflows and other metadata.
2025-08-15 12:27:54 -07:00
Alexey Bataev
09f5b9ab0a Revert "[SLP]Do not include copyable data to the same user twice"
This reverts commit 758c6852c3ffe6b5e259cafadd811e60d8c276fb to fix
buildbot  https://lab.llvm.org/buildbot/#/builders/195/builds/13298
2025-08-15 12:08:31 -07:00
Jasmine Tang
d7a29e5d56
[WebAssembly] Reapply #149461 with correct CondCode in combine of SETCC (#153703)
This PR reapplies https://github.com/llvm/llvm-project/pull/149461

In the original `combineVectorSizedSetCCEquality`, the result of setcc
is being negated by returning setcc with the same cond code, leading to
wrong logic.

For example, with
```llvm
 %cmp_16 = call i32 @memcmp(ptr %a, ptr %b, i32 16)
  %res = icmp eq i32 %cmp_16, 0
```

the original PR producese all_true and then also compares the result
equal to 0 (using the same SETEQ in the returning setcc), meaning that
semantically, it effectively is calling icmp ne.

Instead, the PR should have use SETNE in the returning setcc, this way,
all true return 1, then it is compared again ne 0, which is equivalent
to icmp eq.
2025-08-15 12:06:47 -07:00
Abhinav Gaba
79cf877627
[Offload] Introduce dataFence plugin interface. (#153793)
The purpose of this fence is to ensure that any `dataSubmit`s inserted
into a queue before a `dataFence` finish before finish before any
`dataSubmit`s
inserted after it begin.

This is a no-op for most queues, since they are in-order, and by design
any operations inserted into them occur in order.

But the interface is supposed to be functional for out-of-order queues.

The addition of the interface means that any operations that rely on
such ordering (like ATTACH map-type support in #149036) can invoke it,
without worrying about whether the underlying queue is in-order or
out-of-order.

Once a plugin supports out-of-order queues, the plugin can implement
this function, without requiring any change at the libomptarget level.

---------

Co-authored-by: Alex Duran <alejandro.duran@intel.com>
2025-08-15 11:49:35 -07:00
zGoldthorpe
82caa251d4
[InstCombine] Fold integer unpack/repack patterns through ZExt (#153583)
This patch explicitly enables the InstCombiner to fold integer
unpack/repack patterns such as

```llvm
define i64 @src_combine(i32 %lower, i32 %upper) {
  %base = zext i32 %lower to i64

  %u.0 = and i32 %upper, u0xff
  %z.0 = zext i32 %u.0 to i64
  %s.0 = shl i64 %z.0, 32
  %o.0 = or i64 %base, %s.0

  %r.1 = lshr i32 %upper, 8
  %u.1 = and i32 %r.1, u0xff
  %z.1 = zext i32 %u.1 to i64
  %s.1 = shl i64 %z.1, 40
  %o.1 = or i64 %o.0, %s.1

  %r.2 = lshr i32 %upper, 16
  %u.2 = and i32 %r.2, u0xff
  %z.2 = zext i32 %u.2 to i64
  %s.2 = shl i64 %z.2, 48
  %o.2 = or i64 %o.1, %s.2

  %r.3 = lshr i32 %upper, 24
  %u.3 = and i32 %r.3, u0xff
  %z.3 = zext i32 %u.3 to i64
  %s.3 = shl i64 %z.3, 56
  %o.3 = or i64 %o.2, %s.3

  ret i64 %o.3
}
; =>
define i64 @tgt_combine(i32 %lower, i32 %upper) {
  %base = zext i32 %lower to i64
  %upper.zext = zext i32 %upper to i64
  %s.0 = shl nuw i64 %upper.zext, 32
  %o.3 = or disjoint i64 %s.0, %base
  ret i64 %o.3
}
```

Alive2 proofs: [YAy7ny](https://alive2.llvm.org/ce/z/YAy7ny)
2025-08-15 12:48:32 -06:00
Alexey Bataev
758c6852c3 [SLP]Do not include copyable data to the same user twice
If the copyable schedule data is created and the user is used several
times in the user node, no need to count same data for the same user
several times, need to include it only ones.

Fixes #153754
2025-08-15 11:47:35 -07:00
Erich Keane
dcdbd5b55d
[OpenACC][NFCI] Implement 'recipe' generation for firstprivate copy (#153622)
The 'firstprivate' clause requires that we do a 'copy' operation, so
this patch creates some AST nodes from which we can generate the copy
operation, including a 'temporary' and array init. For the most part
this is pretty similar to what 'private' does other than the fact that
the source is copy (and not default init!), and that there is a
temporary from which to copy.

---------

Co-authored-by: Andy Kaylor <akaylor@nvidia.com>
2025-08-15 18:42:40 +00:00
Stanislav Mekhanoshin
29976f2e58
[AMDGPU] Handle S_GETREG_B32 hazard on gfx1250 (#153848)
GFX1250 SPG says: S_GETREG_B32 does not wait for idle before executing.
The user must S_WAIT_ALU 0 before S_GETREG_B32 on:
STATUS, STATE_PRIV, EXCP_FLAG_PRIV, or EXCP_FLAG_USER.
2025-08-15 11:38:22 -07:00
XChy
3a4a60deff
[VectorCombine] Apply InstSimplify in scalarizeOpOrCmp to avoid infinite loop (#153069)
Fixes #153012

As we tolerate unfoldable constant expressions in `scalarizeOpOrCmp`, we
may fold
```llvm
define void @bug(ptr %ptr1, ptr %ptr2, i64 %idx) #0 {
entry:
  %158 = insertelement <2 x i64> <i64 5, i64 ptrtoint (ptr @val to i64)>, i64 %idx, i32 0
  %159 = or disjoint <2 x i64> splat (i64 2), %158
  store <2 x i64> %159, ptr %ptr2
  ret void
}
```

to

```llvm
define void @bug(ptr %ptr1, ptr %ptr2, i64 %idx) {
entry:
  %.scalar = or disjoint i64 2, %idx
  %0 = or <2 x i64> splat (i64 2), <i64 5, i64 ptrtoint (ptr @val to i64)>
  %1 = insertelement <2 x i64> %0, i64 %.scalar, i64 0
  store <2 x i64> %1, ptr %ptr2, align 16
  ret void
}
```
And it would be folded back in `foldInsExtBinop`, resulting in an
infinite loop.

This patch forces scalarization iff InstSimplify can fold the constant
expression.
2025-08-15 18:38:04 +00:00
Dave Lee
1dc0005d6d
Revert "[lldb] Fallback to expression eval when Dump of variable fails in dwim-print" (#153824)
Reverts llvm/llvm-project#151374

Superseded by https://github.com/llvm/llvm-project/pull/152417
2025-08-15 11:29:31 -07:00
Stanislav Mekhanoshin
5d28284dbb
[AMDGPU] gfx1250 does not need nop before VGPR dealloc (#153844)
This has no impact as the dealloc is now practically disabled.
2025-08-15 11:29:02 -07:00
Valentin Clement (バレンタイン クレメン)
3720d8b52d
[flang][cuda] Update some bind name to fast version and add __sincosf (#153744)
Use the fast version in the bind name and reorder these fast math
functions. Add missing __sincosf interface.
2025-08-15 11:07:15 -07:00
Aaron Ballman
ed6d505fab
[C][Docs] Add backported language features (#153837)
We've backported a lot more features from C to previous C standards than
we were documenting. I took a pass over the c_status page for Clang and
pulled more entries to add to our documentation.
2025-08-15 13:59:41 -04:00
Kaitlin Peng
0bb1af478a
[DirectX] Add GlobalDCE pass after finalize linkage pass in DirectX backend (#151071)
Fixes #139023.

This PR essentially removes unused global variables:
- Restores the `GlobalDCE` Legacy pass and adds it to the DirectX
backend after the finalize linkage pass
- Converts external global variables with no usage to internal linkage
in the finalize linkage pass
  - (so they can be removed by `GlobalDCE`)
- Makes the `dxil-finalize-linkage` pass usable using the new pass
manager flag syntax
- Adds tests to `finalize_linkage.ll` that make sure unused global
variables are removed
- Adds a use for variable `@CBV` in `opaque-value_as_metadata.ll` so it
isn't removed
- Changes the `scalar-data.ll` run command to avoid removing its global
variables

---------

Co-authored-by: Farzon Lotfi <farzonlotfi@microsoft.com>
2025-08-15 10:45:34 -07:00
Aiden Grossman
069f8121e0
[X86] Add RCU for Skylake Models (#153832)
We cannot actually retire an infinite number of uops per cycle. This
patch adds a RCU to the skylake scheduling model to fix this. I'm
purposefully using a loose upper bound here. We're unlikely to actually
get four fused uops per cycle, but this is better than not setting
anything. Most realistic code I've put through uiCA will retire up to ~6
uops per cycle.

Information taken from
https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(client).

This requires modification of the two zero idiom tests because we do not
currently model the CPU frontend which would likely be the actual
bottleneck in that case.

Related to #153747.
2025-08-15 10:33:26 -07:00
Valentin Clement (バレンタイン クレメン)
115f816069
[flang][cuda] Add missing bind name for __int2double_rn (#153720) 2025-08-15 10:27:19 -07:00
Valentin Clement (バレンタイン クレメン)
0e4af726cb
[flang][cuda] Add interface for __fdividef (#153742) 2025-08-15 10:26:40 -07:00
Valentin Clement (バレンタイン クレメン)
0e8c964c21
[flang][cuda] Add interfaces for double_as_longlong and longlong_as_double (#153719) 2025-08-15 17:26:11 +00:00
Alex MacLean
bc77363235
[NVPTX] Do not mark move of global address as cheap enabling more CSE (#153730) 2025-08-15 10:17:34 -07:00
Valentin Clement (バレンタイン クレメン)
fd3f052aeb
[flang][cuda] Add interfaces for int_as_float and float_as_int (#153716) 2025-08-15 10:00:53 -07:00
Simon Pilgrim
92cb0414ca [X86] avx512vnni-builtins.c / avx512vlvnni-builtins.c - add C/C++ and 32/64-bit test coverage 2025-08-15 17:55:33 +01:00
asraa
b045729eb4
[mlir][presburger] add functionality to compute local mod in IntegerRelation (#153614)
Similar to `IntegerRelation::addLocalFloorDiv`, this adds a utility
`IntegerRelation::addLocalModulo` that adds and returns a local variable
that is the modulus of an affine function of the variables modulo some
constant modulus. The function returns the absolute index of the new var
in the relation.

This is computed by first finding the floordiv of `exprs // modulus = q`
and then computing the remainder `result = exprs - q * modulus`.

Signed-off-by: Asra Ali <asraa@google.com>
2025-08-15 09:55:13 -07:00
zGoldthorpe
a8d25683ee
[PatternMatch] Allow m_ConstantInt to match integer splats (#153692)
When matching integers, `m_ConstantInt` is a convenient alternative to
`m_APInt` for matching unsigned 64-bit integers, allowing one to
simplify

```cpp
const APInt *IntC;
if (match(V, m_APInt(IntC))) {
  if (IntC->ule(UINT64_MAX)) {
    uint64_t Int = IntC->getZExtValue();
    // ...
  }
}
```
to
```cpp
uint64_t Int;
if (match(V, m_ConstantInt(Int))) {
  // ...
}
```

However, this simplification is only true if `V` is a scalar type.
Specifically, `m_APInt` also matches integer splats, but `m_ConstantInt`
does not.

This patch ensures that the matching behaviour of `m_ConstantInt`
parallels that of `m_APInt`, and also incorporates it in some obvious
places.
2025-08-15 10:43:54 -06:00
keinflue
af96ed6bf6
[clang] Inject IndirectFieldDecl even if name conflicts. (#153140)
This modifies InjectAnonymousStructOrUnionMembers to inject an
IndirectFieldDecl and mark it invalid even if its name conflicts with
another name in the scope.

This resolves a crash on a further diagnostic
diag::err_multiple_mem_union_initialization which via
findDefaultInitializer relies on these declarations being present.

Fixes #149985
2025-08-15 09:43:29 -07:00
Simon Pilgrim
2c20a9bfb3 [X86] avx512bf16-builtins.c / avx512vlbf16-builtins.c - add C/C++ and 32/64-bit test coverage 2025-08-15 17:38:43 +01:00
CatherineMoore
3a8f579a23
[OpenMP] Update printf statement with missing argument. (#153704) 2025-08-15 16:34:00 +00:00
Valentin Clement (バレンタイン クレメン)
583499a8cf
[flang][cuda] Add missing bind name for __hiloint2double, __double2loint and __double2hiint (#153713) 2025-08-15 09:32:59 -07:00
Andrey Timonin
dfa1335db1
[mlir][emitc] Add verification for the emitc.get_field op (#152577)
This MR adds a `verifier` for the `emitc.get_field` op. 
- The `verifier` checks that the `emitc.get_field` operation is nested
  inside an `emitc.class` op.
- Additionally, appropriate tests for erroneous cases were added for
  class-related operations in `invalid_ops.mlir`.
2025-08-15 18:32:12 +02:00
Min-Yih Hsu
0e9b6d6c8a
[IA][RISCV] Detecting gap mask from a mask assembled by interleaveN intrinsics (#153510)
If the mask of a (fixed-vector) deinterleaved load is assembled by
`vector.interleaveN` intrinsic, any intrinsic arguments that are
all-zeros are regarded as gaps.
2025-08-15 09:22:47 -07:00
Phoebe Wang
b0d2b57f7e
[Headers][X86] Remove more duplicated typedefs (#153820)
They are defined in mmintrin.h
2025-08-16 00:21:40 +08:00
Shubham Sandeep Rastogi
cd0bf2735b Revert "[LLDB] Update DIL handling of array subscripting. (#151605)"
This reverts commit 6d3ad9d9fd830eef0ac8a9d558e826b8b624e17d.

This was reverted because it broke the LLDB greendragon bot.
2025-08-15 09:17:33 -07:00
Craig Topper
853094fd81 [VirtRegMap] Use TRI member variable. NFC 2025-08-15 09:14:09 -07:00
George Burgess IV
c10766cf49
[utils] add stop_at_sha to revert_checker's API (#152011)
This is useful for downstream consumers of this as a module. It's
unclear if interactive use wants this lever, but support can easily be
added if so.
2025-08-15 16:13:29 +00:00
Nikita Popov
01bc742185
[CodeGen] Give ArgListEntry a proper constructor (NFC) (#153817)
This ensures that the required fields are set, and also makes the
construction more convenient.
2025-08-15 18:06:07 +02:00
Daniel Paoliello
1d1e52e614
[win][x64] Allow push/pop for stack alloc when unwind v2 is required (#153621)
While attempting to enable Windows x64 unwind v2, compilation failed
with the following error:

```
fatal error: error in backend: Windows x64 Unwind v2 is required, but LLVM has generated incompatible code in function '<redacted>': Cannot pop registers before the stack allocation has been deallocated
```

I traced this down to an optimization in `X86FrameLowering`:

<6961139ce9/llvm/lib/Target/X86/X86FrameLowering.cpp (L324-L340)>

Technically, using `push`/`pop` to adjust the stack is permitted under
unwind v2: the requirement for a "canonical" epilog is that the stack is
fully adjusted before the registers listed as pushed in the unwind table
are popped. So, as long as the `.seh_unwindv2start` pseudo is after the
pops that adjust the stack, then everything will work correctly.

One other side effect of this change is that the stack is now allowed to
be adjusted across multiple instructions, which would be needed for
extremely large stack frames.
2025-08-15 09:03:44 -07:00
Leandro Lacerda
08ff017fb0
[libc] Improve GPU benchmarking (#153512)
This patch improves the GPU benchmarking in this way:

* Replace `rand`/`srand` with a deterministic per-thread RNG seeded by
`call_index`: reproducible, apples-to-apples libc vs vendor comparisons.
* Fix input generation: sample the unbiased exponent uniformly in
`[min_exp, max_exp]`, clamp bounds, and skip `Inf`, `NaN`, `-0.0`, and
`+0.0`.
* Fix standard deviation: use an explicit estimator from sums and
sums-of-squares (`sqrt(E[x^2] − E[x]^2)`) across samples.
* Fix throughput overhead: subtract a loop-only baseline inside
NVPTX/AMDGPU timing backends so `benchmark()` gets cycles-per-call
already corrected (no `overhead()` call).
* Adapt existing math benchmarks to the new RNG/timing plumbing (plumb
`call_index`, drop `rand/srand`, clean includes).
* Correct inter-thread aggregation: use iteration-weighted pooling to
compute the global mean/variance, ensuring statistically sound `Cycles
(Mean)` and `Stddev`.
* Remove `Time / Iteration` column from the results table: it reported
per-thread convergence time (not per-call latency) and was
redundant/misleading next to `Cycles (Mean)`.
* Remove unused `BenchmarkLogger` files: dead code that added
maintenance and cognitive overhead without providing functionality.

---

## TODO (before merge)

* [ ] Investigate compiler warnings and address their root causes.
* [x] Review how per-thread results are aggregated into the overall
result.

## Follow-ups (future PRs)

* Add support to run throughput benchmarks with uniform (linear) input
distributions, alongside the current log2-uniform scheme.
* Review/adjust the configuration and coverage of existing math
benchmarks.
* Add more math benchmarks (e.g., `exp`/`expf`, others).
2025-08-15 11:00:17 -05:00
Ramkumar Ramachandra
f34326dac8
[VPlan] Introduce vputils::onlyScalarValuesUsed (NFC) (#153577) 2025-08-15 15:55:59 +00:00
Shafik Yaghmour
868efdcf38
[Clang][Bytecode][NFC] Move Result into APSInt constructor (#153664)
Static analysis flagged this line because we are copying Result instead
of moving it.
2025-08-15 08:52:49 -07:00
Dave Lee
ae7e1b82fe
[lldb] Print ValueObject when GetObjectDescription fails (#152417)
This fixes a few bugs, effectively through a fallback to `p` when `po` fails.

The motivating bug this fixes is when an error within the compiler causes `po` to fail.
Previously when that happened, only its value (typically an object's address) was
printed – and problematically, no compiler diagnostics were shown. With this change,
compiler diagnostics are shown, _and_ the object is fully printed (ie `p`).

Another bug this fixes is when `po` is used on a type that doesn't provide an object
description (such as a struct). Again, the normal `ValueObject` printing is used.

Additionally, this also improves how lldb handles an object description method that
fails in some way. Now an error will be shown (it wasn't before), and the value will be
printed normally.
2025-08-15 08:37:26 -07:00
Tim Gymnich
ffaba758fb
[MLIR][ROCDL] Add permlane16.swap and permanlane32.swap (#153804)
add rocdl.permlane16.swap and rocdl.permanlane32.swap
2025-08-15 17:35:31 +02:00
Simon Pilgrim
38eb14f27c [X86] avx512vbmi2-builtins.c / avx512vlvbmi2-builtins.c - add C/C++ and 32/64-bit test coverage 2025-08-15 16:35:16 +01:00
Simon Pilgrim
7df862818e [X86] avx512vbmi-builtins.c / avx512vbmivl-builtin.c - add C/C++ and 32/64-bit test coverage 2025-08-15 16:35:15 +01:00
Tim Renouf
f279c47cb3
AMDGPU gfx12: Add _dvgpr$ symbols for dynamic VGPRs (#148251)
For each function with the AMDGPU_CS_Chain calling convention, with
dynamic VGPRs enabled, add a _dvgpr$ symbol, with the value of the
function symbol, plus an offset encoding one less than the number of
VGPR blocks used by the function (16 VGPRs per block, no more than 128)
in bits 5..3 of the symbol value. This is used by a front-end to have
functions that are chained rather than called, and a dispatcher that
dynamically resizes the VGPR count before dispatching to a function.
2025-08-15 16:33:06 +01:00
Aiden Grossman
0b04168948
[CI] Add Basic Bazel Checks (#153740)
Having basic checks (like running buildifier) on the upstream bazel
files would be helpful for contributors maintaining the bazel build. Add
basic checks (currently just buildifier) to a workflow that runs
whenever the bazel build files change.
2025-08-15 08:30:07 -07:00
cmtice
6d3ad9d9fd
[LLDB] Update DIL handling of array subscripting. (#151605)
This updates the DIL code for handling array subscripting to more
closely match and handle all the cases from the original 'frame var'
implementation. Also updates the DIL array subscripting test. This
particularly fixes some issues with handling synthetic children, objc
pointers, and accessing specific bits within scalar data types.
2025-08-15 08:26:45 -07:00
Nikita Popov
11c2240049 [SDAGBuilder] Rename RetTys -> RetVTs (NFC)
Make it clearer that this is a vector of EVTs, not IR types.

Based on:
https://github.com/llvm/llvm-project/pull/153798#discussion_r2279066696
2025-08-15 17:06:33 +02:00
Philip Reames
606937474e
[SDAG] Remove IndexType manipulation in getUniformBase and callers (#151578)
All paths set it to the same value, just propagate that value to the
consumer.
2025-08-15 08:00:47 -07:00
Florian Hahn
2b1e06598f
[LV] Regenerate some more check lines. (NFC) 2025-08-15 15:53:19 +01:00
Alexey Bataev
13b54f7dc1 [SLP] Recalculate dependencies for potential control dependencies if cleared
If the control dependecies are cleared after calcellation of the
copyables, need to reclculate them unconditionally.

Fixes #153754 #153676
2025-08-15 07:52:10 -07:00
Phoebe Wang
f24d91eb2c
[Headers][X86] Remove duplicate __v8hu, NFCI (#153734)
Newly added in xmmintrin.h by c8312bdd1665225c585dd2b0bff5e46d569edd45
2025-08-15 22:48:59 +08:00