548737 Commits

Author SHA1 Message Date
Joao Saffran
3b25b34cd9 save a copy 2025-08-19 16:35:05 -07:00
Joao Saffran
d38c00d09a remove unused 2025-08-19 16:33:07 -07:00
Joao Saffran
8eb82fdac3 adding missing import 2025-08-19 10:56:17 -07:00
Joao Saffran
1d29111c4f fix whitespace in test 2025-08-19 10:47:15 -07:00
Joao Saffran
6539364fa7 clean up 2025-08-19 10:45:49 -07:00
Joao Saffran
f6f2e61d5d removing root parameter header from MC 2025-08-19 10:34:02 -07:00
Joao Saffran
1690a9c04d clean up 2025-08-18 18:52:48 -07:00
Joao Saffran
31ec5e50ff making parameter type and shader visibility use enums 2025-08-18 18:49:45 -07:00
Phoebe Wang
b0d2b57f7e
[Headers][X86] Remove more duplicated typedefs (#153820)
They are defined in mmintrin.h
2025-08-16 00:21:40 +08:00
Shubham Sandeep Rastogi
cd0bf2735b Revert "[LLDB] Update DIL handling of array subscripting. (#151605)"
This reverts commit 6d3ad9d9fd830eef0ac8a9d558e826b8b624e17d.

This was reverted because it broke the LLDB greendragon bot.
2025-08-15 09:17:33 -07:00
Craig Topper
853094fd81 [VirtRegMap] Use TRI member variable. NFC 2025-08-15 09:14:09 -07:00
George Burgess IV
c10766cf49
[utils] add stop_at_sha to revert_checker's API (#152011)
This is useful for downstream consumers of this as a module. It's
unclear if interactive use wants this lever, but support can easily be
added if so.
2025-08-15 16:13:29 +00:00
Nikita Popov
01bc742185
[CodeGen] Give ArgListEntry a proper constructor (NFC) (#153817)
This ensures that the required fields are set, and also makes the
construction more convenient.
2025-08-15 18:06:07 +02:00
Daniel Paoliello
1d1e52e614
[win][x64] Allow push/pop for stack alloc when unwind v2 is required (#153621)
While attempting to enable Windows x64 unwind v2, compilation failed
with the following error:

```
fatal error: error in backend: Windows x64 Unwind v2 is required, but LLVM has generated incompatible code in function '<redacted>': Cannot pop registers before the stack allocation has been deallocated
```

I traced this down to an optimization in `X86FrameLowering`:

<6961139ce9/llvm/lib/Target/X86/X86FrameLowering.cpp (L324-L340)>

Technically, using `push`/`pop` to adjust the stack is permitted under
unwind v2: the requirement for a "canonical" epilog is that the stack is
fully adjusted before the registers listed as pushed in the unwind table
are popped. So, as long as the `.seh_unwindv2start` pseudo is after the
pops that adjust the stack, then everything will work correctly.

One other side effect of this change is that the stack is now allowed to
be adjusted across multiple instructions, which would be needed for
extremely large stack frames.
2025-08-15 09:03:44 -07:00
Leandro Lacerda
08ff017fb0
[libc] Improve GPU benchmarking (#153512)
This patch improves the GPU benchmarking in this way:

* Replace `rand`/`srand` with a deterministic per-thread RNG seeded by
`call_index`: reproducible, apples-to-apples libc vs vendor comparisons.
* Fix input generation: sample the unbiased exponent uniformly in
`[min_exp, max_exp]`, clamp bounds, and skip `Inf`, `NaN`, `-0.0`, and
`+0.0`.
* Fix standard deviation: use an explicit estimator from sums and
sums-of-squares (`sqrt(E[x^2] − E[x]^2)`) across samples.
* Fix throughput overhead: subtract a loop-only baseline inside
NVPTX/AMDGPU timing backends so `benchmark()` gets cycles-per-call
already corrected (no `overhead()` call).
* Adapt existing math benchmarks to the new RNG/timing plumbing (plumb
`call_index`, drop `rand/srand`, clean includes).
* Correct inter-thread aggregation: use iteration-weighted pooling to
compute the global mean/variance, ensuring statistically sound `Cycles
(Mean)` and `Stddev`.
* Remove `Time / Iteration` column from the results table: it reported
per-thread convergence time (not per-call latency) and was
redundant/misleading next to `Cycles (Mean)`.
* Remove unused `BenchmarkLogger` files: dead code that added
maintenance and cognitive overhead without providing functionality.

---

## TODO (before merge)

* [ ] Investigate compiler warnings and address their root causes.
* [x] Review how per-thread results are aggregated into the overall
result.

## Follow-ups (future PRs)

* Add support to run throughput benchmarks with uniform (linear) input
distributions, alongside the current log2-uniform scheme.
* Review/adjust the configuration and coverage of existing math
benchmarks.
* Add more math benchmarks (e.g., `exp`/`expf`, others).
2025-08-15 11:00:17 -05:00
Ramkumar Ramachandra
f34326dac8
[VPlan] Introduce vputils::onlyScalarValuesUsed (NFC) (#153577) 2025-08-15 15:55:59 +00:00
Shafik Yaghmour
868efdcf38
[Clang][Bytecode][NFC] Move Result into APSInt constructor (#153664)
Static analysis flagged this line because we are copying Result instead
of moving it.
2025-08-15 08:52:49 -07:00
Dave Lee
ae7e1b82fe
[lldb] Print ValueObject when GetObjectDescription fails (#152417)
This fixes a few bugs, effectively through a fallback to `p` when `po` fails.

The motivating bug this fixes is when an error within the compiler causes `po` to fail.
Previously when that happened, only its value (typically an object's address) was
printed – and problematically, no compiler diagnostics were shown. With this change,
compiler diagnostics are shown, _and_ the object is fully printed (ie `p`).

Another bug this fixes is when `po` is used on a type that doesn't provide an object
description (such as a struct). Again, the normal `ValueObject` printing is used.

Additionally, this also improves how lldb handles an object description method that
fails in some way. Now an error will be shown (it wasn't before), and the value will be
printed normally.
2025-08-15 08:37:26 -07:00
Tim Gymnich
ffaba758fb
[MLIR][ROCDL] Add permlane16.swap and permanlane32.swap (#153804)
add rocdl.permlane16.swap and rocdl.permanlane32.swap
2025-08-15 17:35:31 +02:00
Simon Pilgrim
38eb14f27c [X86] avx512vbmi2-builtins.c / avx512vlvbmi2-builtins.c - add C/C++ and 32/64-bit test coverage 2025-08-15 16:35:16 +01:00
Simon Pilgrim
7df862818e [X86] avx512vbmi-builtins.c / avx512vbmivl-builtin.c - add C/C++ and 32/64-bit test coverage 2025-08-15 16:35:15 +01:00
Tim Renouf
f279c47cb3
AMDGPU gfx12: Add _dvgpr$ symbols for dynamic VGPRs (#148251)
For each function with the AMDGPU_CS_Chain calling convention, with
dynamic VGPRs enabled, add a _dvgpr$ symbol, with the value of the
function symbol, plus an offset encoding one less than the number of
VGPR blocks used by the function (16 VGPRs per block, no more than 128)
in bits 5..3 of the symbol value. This is used by a front-end to have
functions that are chained rather than called, and a dispatcher that
dynamically resizes the VGPR count before dispatching to a function.
2025-08-15 16:33:06 +01:00
Aiden Grossman
0b04168948
[CI] Add Basic Bazel Checks (#153740)
Having basic checks (like running buildifier) on the upstream bazel
files would be helpful for contributors maintaining the bazel build. Add
basic checks (currently just buildifier) to a workflow that runs
whenever the bazel build files change.
2025-08-15 08:30:07 -07:00
cmtice
6d3ad9d9fd
[LLDB] Update DIL handling of array subscripting. (#151605)
This updates the DIL code for handling array subscripting to more
closely match and handle all the cases from the original 'frame var'
implementation. Also updates the DIL array subscripting test. This
particularly fixes some issues with handling synthetic children, objc
pointers, and accessing specific bits within scalar data types.
2025-08-15 08:26:45 -07:00
Nikita Popov
11c2240049 [SDAGBuilder] Rename RetTys -> RetVTs (NFC)
Make it clearer that this is a vector of EVTs, not IR types.

Based on:
https://github.com/llvm/llvm-project/pull/153798#discussion_r2279066696
2025-08-15 17:06:33 +02:00
Philip Reames
606937474e
[SDAG] Remove IndexType manipulation in getUniformBase and callers (#151578)
All paths set it to the same value, just propagate that value to the
consumer.
2025-08-15 08:00:47 -07:00
Florian Hahn
2b1e06598f
[LV] Regenerate some more check lines. (NFC) 2025-08-15 15:53:19 +01:00
Alexey Bataev
13b54f7dc1 [SLP] Recalculate dependencies for potential control dependencies if cleared
If the control dependecies are cleared after calcellation of the
copyables, need to reclculate them unconditionally.

Fixes #153754 #153676
2025-08-15 07:52:10 -07:00
Phoebe Wang
f24d91eb2c
[Headers][X86] Remove duplicate __v8hu, NFCI (#153734)
Newly added in xmmintrin.h by c8312bdd1665225c585dd2b0bff5e46d569edd45
2025-08-15 22:48:59 +08:00
David Green
144f3c4cbf
[AArch64] Adjust the scheduling info of SVE FCMP on Cortex-A510. (#153810)
According to the SWOG, these have a lower throughput than other
instructions. Mark them as taking multiple cycles to model that.
2025-08-15 15:45:33 +01:00
Mikhail R. Gadelha
d7199544af
[libc] Fix mbrtowc test (#153721)
Previously, we were trying to memset a pointer that wasn't being
initialized, and the test would randomly fail.

This PR replaces the pointers with actual objects.
2025-08-15 11:44:33 -03:00
Akash Banerjee
1fd1d63463 [MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (#153048)
Add a new AutomapToTargetData pass. This gathers the declare target
enter variables which have the AUTOMAP modifier. And adds
omp.declare_target_enter/exit mapping directives for fir.alloca and
fir.free oeprations on the AUTOMAP enabled variables.

Automap Ref: OpenMP 6.0 section 7.9.7.
2025-08-15 15:41:41 +01:00
Simon Pilgrim
09267f6720 [X86] avx512vp2intersect-builtins.c / avx512vlvp2intersect-builtins.c - add C/C++ and 32/64-bit test coverage 2025-08-15 15:39:12 +01:00
Krishna Pandey
6602d6c7a7
[libc][math][docs] Add documentation for BFloat16 type (#153475)
Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>
2025-08-15 20:07:33 +05:30
Matt Arsenault
9a14b1d254
RuntimeLibcalls: Generate table of libcall name lengths (#153210)
Avoids strlen when constructing the returned StringRef. We were emitting
these in the libcall name lookup anyway, so split out the offsets for
general use.

Currently emitted as a separate table, not sure if it would be better
to change the string offset table to store pairs of offset and width
instead.
2025-08-15 23:29:10 +09:00
Benjamin Chetioui
8c0914d826
[mlir][bazel] Fix Bazel build after 6bb8f6f2d0ed672217e0a0521afc5b86913b717e (#153811) 2025-08-15 14:28:44 +00:00
Kazu Hirata
f4bc3151bb [mlir] Fix warnings
This patch fixes:

  mlir/lib/Target/Wasm/TranslateFromWasm.cpp:82:1: error: unused
  variable 'wasmSectionName<(anonymous
  namespace)::WasmSectionType::DATACOUNT>'
  [-Werror,-Wunused-const-variable]

  mlir/lib/Target/Wasm/TranslateFromWasm.cpp💯5: error: unused
  variable 'valueTypesEncodings' [-Werror,-Wunused-const-variable]

  mlir/lib/Target/Wasm/TranslateFromWasm.cpp:735:13: error: unused
  function 'buildLiteralType<unsigned int>'
  [-Werror,-Wunused-function]

  mlir/lib/Target/Wasm/TranslateFromWasm.cpp:740:13: error: unused
  function 'buildLiteralType<unsigned long>'
  [-Werror,-Wunused-function]

  mlir/lib/Target/Wasm/TranslateFromWasm.cpp:292:33: error: private
  field 'symbols' is not used [-Werror,-Wunused-private-field]
2025-08-15 07:24:31 -07:00
Simon Pilgrim
17dd57b00e [X86] avxvnni-builtins.c / avxvnniint8-builtins.c / avxvnniint16-builtins.c - add C/C++ and 32/64-bit test coverage 2025-08-15 15:17:15 +01:00
Guray Ozen
4c389178ee
[MLIR][NVVM] Print readable modifer (NFC) (#153779)
Currently, modifier is printed as address, so it is not readable and not
useful. This PR adds readable printing for it.

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-08-15 15:47:39 +02:00
Guray Ozen
af92cabdef
[MLIR][NVVM] Combine griddepcontrol Ops (#152525)
We've 2 ops:
1. nvvm.griddepcontrol.wait
2. nvvm.griddepcontrol.launch_dependents

They are related to Grid Dependent Launch (or programmatic dependent
launch in CUDA) and same concept. This PR unifies both ops into a single
one.
2025-08-15 15:47:12 +02:00
Erich Keane
15d7a95ea9
[CIR] Refactor recipe init generation, cleanup after init (#153610)
In preperation of the firstprivate implementation, this separates out
some functions to make it easier to read.

Additionally, it cleans up the VarDecl->alloca relationship, which will
prevent issues if we have to re-use the same vardecl for a future
generated recipe (and causes concerns in firstprivate later).
2025-08-15 06:41:42 -07:00
Gaëtan Bossu
9828745661
[AArch64][ISel] Select constructive EXT_ZZI pseudo instruction (#152554)
The patch adds patterns to select the EXT_ZZI_CONSTRUCTIVE pseudo
instead of the EXT_ZZI destructive instruction for vector_splice. This
only works when the two inputs to vector_splice are identical.

Given that registers aren't tied anymore, this gives the register
allocator more freedom and a lot of MOVs get replaced with MOVPRFX.

In some cases however, we could have just chosen the same input and
output register, but regalloc preferred not to. This means we end up
with some test cases now having more instructions: there is now a
MOVPRFX while no MOV was previously needed.
2025-08-15 14:30:24 +01:00
David Green
649762cb04 Revert "[AArch64][GlobalISel] Add additional vecreduce.fadd and fadd 0.0 tests. NFC"
This reverts commit 16314eb7312dab38d721c70f247f2117e9800704 as the test cases
are failing under EXPENSIVE_CHECKS. Scalar vecreduce.fadd are not valid in
GISel.
2025-08-15 14:23:53 +01:00
Stephen Tozer
bc216b057d
[Debugify] Improve reduction of debugify coverage build output (#150212)
In current DebugLoc coverage builds, the output for any reasonably large
build can become very large if any missing DebugLocs are present; this
happens because single errors in LLVM may result in many errors being
reported in the output report. The main cause of this is that the empty
locations attached to instructions may be propagated to other
instructions in later passes, which will each be reported as new errors.
This patch prevents this by adding an "unknown" annotation to
instructions after reporting them once, ensuring that any other
DebugLocs copied or derived from the original empty location will not be
marked as new errors.

As a separate but related change, this patch updates the report
generation script to deduplicate results using the recorded stacktrace
if they are available, instead of the pass+instruction combination. This
reduces the size of the reduction, but makes the reduction highly
reliable, as the stacktrace allows us to very precisely identify when
two bugs have originated from the same place.
2025-08-15 14:01:04 +01:00
Simon Pilgrim
bcb4984a0b [X86] select-smin-smax.ll - add i128 tests
Helps check quality of legality codegen (all we had was x86 i64 handling)
2025-08-15 13:48:13 +01:00
Simon Pilgrim
263e458273
[X86] select-smin-smax.ll - add i8/i16 test coverage (#153788)
Pulled out of #151893 to show 32/64-bit target coverage
2025-08-15 13:37:11 +01:00
Erick Ochoa Lopez
61caab7789
[mlir][llvm] Add align attribute to llvm.intr.masked.{expandload,compressstore} (#153063)
* Add `requiresArgsAndResultsAttr` to `LLVM_OneResultIntrOp`
* Add `args_attrs` to `llvm.intr.masked.{expandload,compressstore}`

The LLVM intrinsics
[`llvm.intr.masked.expandload`](https://llvm.org/docs/LangRef.html#llvm-masked-expandload-intrinsics)
and
[`llvm.intr.masked.compressstore`](https://llvm.org/docs/LangRef.html#llvm-masked-compressstore-intrinsics)
both allow an optional align parameter attribute to be set which
defaults to one.

Inlining the documentation below for [`llvm.intr.masked.expandload` 's
](https://llvm.org/docs/LangRef.html#id1522) and
[`llvm.intr.masked.compressstore`'s](https://llvm.org/docs/LangRef.html#id1522)
arguments respectively

> The `align` parameter attribute can be provided for the first
argument. The pointer alignment defaults to 1.

> The `align` parameter attribute can be provided for the second
argument. The pointer alignment defaults to 1.
2025-08-15 08:34:14 -04:00
Mehdi Amini
69453d7021
[MLIR] Fix memory leak in importWebAssemblyToModule when it fails to import (#153794) 2025-08-15 12:33:25 +00:00
David Spickett
0fca1e4e06 [lldb][lldb-dap][test] Correct skip in TestDAP_launch
Fixes 4f65345ab5f2787a4704efb5828657c50be6d65a

Yet again I forgot it's skip[I]f.
2025-08-15 12:29:26 +00:00
Mehdi Amini
7640645f79
[MLIR][Wasm] Remove statistics as they depend on global ctors (#153795)
Use a debug log instead for now.
2025-08-15 12:29:20 +00:00