548254 Commits

Author SHA1 Message Date
Krzysztof Parzyszek
4b7f3806f6
[flang][OpenMP] Move rewriting of min/max from Lower to Semantics (#153038)
There semantic analysis of the ATOMIC construct will require additional
rewriting (reassociation of certain expressions for user convenience),
and that will be driven by diagnoses made in the semantic checks.

While the rewriting of min/max is not required to be done in semantic
analysis, moving it there will make all rewriting for ATOMIC construct
be located in a single location.
2025-08-12 12:13:50 -05:00
Andy Kaylor
54f53c988d
[CIR] Introduce the CIR global_view attribute (#153044)
This change introduces the #cir.global_view attribute and adds support
for using that attribute to handle initializing a global variable with
the address of another global variable.

This does not yet include support for the optional list of indices to
get an offset from the base address. Those will be added in a follow-up
patch.
2025-08-12 10:02:00 -07:00
Andy Kaylor
7f195b36ee
[CIR] Initialize vptr in dynamic classes (#152574)
This adds support for initializing the vptr member of a dynamic class in
the constructor of that class.

This does not include support for lowering the
`cir.vtable.address_point` operation to the LLVM dialect. That handling
will be added in a follow-up patch.
2025-08-12 10:00:38 -07:00
Andy Kaylor
7f22f5bac1
[CIR] Introduce more cleanup infrastructure (#152589)
Support for normal cleanups was introduced with a simplified
implementation compared to what's in the incubator (which corresponds
closely to the classic codegen implementation).

This change introduces more of the infrastructure that will later be
needed to handle non-trivial cleanup cases, including exception
handling.
2025-08-12 10:00:13 -07:00
Kane Wang
74fbdbf91f
[RISCV][GISel][NFC] Add MIR legalizer tests for G_UADDE (rv32 & rv64) (#152827)
Add MIR tests that exercise legalization of the G_UADDE (unsigned add
with extend/carry) operation for RISC-V targets.
2025-08-12 09:59:17 -07:00
Farzon Lotfi
544562ebc2
[DirectX] Remove lifetime intrinsics and run Dead Store Elimination (#152636)
fixes #151764

This fix has two parts first we track all lifetime intrinsics and if
they are users of an alloca of a target extention like dx.RawBuffer then
we eliminate those memory intrinsics when we visit the alloca.

We do step one to allow us to use the Dead Store Elimination Pass. This
removes the alloca and simplifies the use of the target extention back
to using just the global. That keeps things in a form the
DXILBitcodeWriter is expecting.

Obviously to pull this off we needed to bring back the legacy pass
manager plumbing for the DSE pass and hook it up into the DirectX
backend.

The net impact of this change is that DML shader pass rate went from
89.72% (4268 successful compilations) to 90.98% (4328 successful
compilations).
2025-08-12 12:42:08 -04:00
Thurston Dang
219893297b
[sanitizer] Downgrade TestPTrace() Reports to VReport (#152350)
Requested in
https://github.com/llvm/llvm-project/pull/152072#discussion_r2257892739
2025-08-12 09:37:10 -07:00
Orlando Cazalet-Hyams
54f92c7806
[RemoveDIs][AMDGPU] Replace defunct getAssignmentMarkers call (#153212)
Not quite NFC as it looks like the original intrinsic-handling code
never got updated to use records. This was never caught because that
code wasn't tested. I've adjusted an existing test so the behaviour is
now covered.
2025-08-12 17:20:38 +01:00
Krishna Pandey
c819c246f3
[libc][math][c++23] Add bf16div{,f,l,f128} math functions (#153191)
This PR adds the following basic math functions for BFloat16 type along
with the tests:
- bf16div
- bf16divf
- bf16divl
- bf16divf128

---------

Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>
2025-08-12 21:46:22 +05:30
modiking
38d854c6e8
[MLIR][NVVM] Update MLIR mapa to reflect new address space (#146031)
The mapa.shared.cluster variant that takes in address-space 3 now should
output address-space 7. This patch updates the NVVMOps.td file to reflect this.
2025-08-12 21:43:51 +05:30
Thurston Dang
9a174518a8
[NFCI][msan] Precommit tests for AVX-VNNI (#153135)
The tests largely cover AVX-VNNI (Vector Neural Network Instructions):
- vpdpbusd, vpdpbusds
- vpdpwssd, vpdpwssds

AVX-VNNI-INT8:
- vpdpbssd, vpdpbssds
- vpdpbsud, vpdpbsuds
- vpdpbuud, vpdpbuuds

AVX-VNNI-INT16:
- vpdpwsud, vpdpwsuds
- vpdpwusd, vpdpwusds
- vpdpwuud, vpdpwuuds

These instructions are currently heuristically handled (by OR'ing
together the vectors). This is incorrect because:
1) multiplication by a zero should result in an initialized value 2) the
addition is horizontal (within vectors, not "vertically" between
vectors).

Future work can improve the instrumentation by applying the updated
handleVectorPmaddIntrinsic() from
https://github.com/llvm/llvm-project/pull/152941
2025-08-12 09:12:54 -07:00
Thurston Dang
457b14c327
Reapply "[asan] Fix misalignment of variables in fake stack frames" (#153139) (#153142)
This reverts commit 29ad073c6c325dbf92c1aa5a285ca48e55cb918b i.e.,
relands 927e19f5f3b357823f86f6c4f1378abedccadf27.
    
It was reverted because of buildbot breakages. This reland adds
"-pthread" and also moves the test to Posix-only.
    
Original commit message:
    
ASan's instrumentation pass uses
`ASanStackFrameLayout::ComputeASanStackFrameLayout()` to calculate the
offset of variables, taking into account alignment. However, the fake
stack frames returned by the runtime's `GetFrame()` are not guaranteed
to be sufficiently aligned (and in some cases, even guaranteed to be
misaligned), hence the offset addresses may sometimes be misaligned.
    
This change fixes the misalignment issue by padding the FakeStack. Every
fake stack frame is guaranteed to be aligned to the size of the frame.
    
The memory overhead is low: 64KB per FakeStack, compared to the
FakeStack size of ~700KB (min) to 11MB (max).
    
Updates the test case from
https://github.com/llvm/llvm-project/pull/152889.
2025-08-12 09:11:57 -07:00
Amr Hesham
475aa1b1a1
[CIR] CompoundAssignment from ComplexType to ScalarType (#152915)
This change adds support for the CompoundAssignment for ComplexType and
updates our approach for emitting bin op between Complex & Scalar

https://github.com/llvm/llvm-project/issues/141365
2025-08-12 18:01:31 +02:00
Keith Randall
03372c7782
Revert "[libFuzzer] always install signal handler with SA_ONSTACK" (#153114)
Reverts llvm/llvm-project#147422

Seems to be causing problems with tracebacks. Probably the trackback
code doesn't know how to switch back to the regular stack after it gets
to the top of the signal stack.
2025-08-12 08:52:58 -07:00
Nathan Gauër
6abbfcae6e
[SPIR-V] Fix OpVectorShuffle undef emission (#151993)
When an undef/poison value is lowered as a an immediate, it becomes -1.
When reaching the backend, the -1 was printed as operand to
OpVectorShuffle instead of the proper 0xFFFFFFFF.

From the SPIR-V spec:
  A Component literal may also be FFFFFFFF, which means the
  corresponding result component has no source and is undefined.
  
The reason the existing tests were passing `spirv-val` was because the
binary format was used as output, meaning the `-1` was lowered to
`0xFFFFFFFF`. But when the text format is used, `-1` is emitted as-is
which is wrong.

Fixes #151691
2025-08-12 15:50:48 +00:00
Dan Salvato
b09b05a83e
[M68k] Fix incorrect boolean content type (#152572)
M68k's SETCC instruction (`scc`) distinctly fills the destination byte
with all 1s. If boolean contents are set to `ZeroOrOneBooleanContent`,
LLVM can mistakenly think the destination holds `0x01` instead of `0xff`
and emit broken code as a result. This change corrects the boolean
content type to `ZeroOrNegativeOneBooleanContent`.

For example, this IR:

```llvm
define dso_local signext range(i8 0, 2) i8 @testBool(i32 noundef %a) local_unnamed_addr #0 {
entry:
  %cmp = icmp eq i32 %a, 4660
  %. = zext i1 %cmp to i8
  ret i8 %.
}
```

would previously build as:

```asm
testBool:                               ; @testBool
	cmpi.l	#4660, (4,%sp)
	seq	%d0
	and.l	#255, %d0
	rts
```

Notice the `zext` is erroneously not clearing the low bits, and thus the
register returns with 255 instead of 1. This patch fixes the issue:

```asm
testBool:                               ; @testBool
	cmpi.l	#4660, (4,%sp)
	seq	%d0
	and.l	#1, %d0
	rts
```

Most of the tests containing `scc` suffered from the same value error as
described above, so those tests have been updated to match the new
output (which also logically corrects them).
2025-08-12 08:46:41 -07:00
Koakuma
111219ed27
[SPARC] Use FMA instructions when we have UA2007 (#148434) 2025-08-12 22:46:00 +07:00
Benjamin Chetioui
6f3b3604bc
Revert "[ADT] Simplify getFirstEl (NFC)" (#153201)
Reverts llvm/llvm-project#153127

This broke ubsan:
https://lab.llvm.org/buildbot/#/builders/25/builds/10649.
2025-08-12 15:41:49 +00:00
moorabbit
f8653cecd1
[Clang][X86] Replace F16C vcvtph2ps/256 intrinsics with (convert|shuffle)vector builtins (#152911)
The following intrinsics were replaced by a combination of
`__builtin_shufflevector` and `__builtin_convertvector`:
- `__builtin_ia32_vcvtph2ps`
- `__builtin_ia32_vcvtph2ps256`

Fixes #152749
2025-08-12 16:32:19 +01:00
Abid Qadeer
62d0b712b7
[OMPIRBuilder] Avoid invalid debug location. (#153190)
Fixes #153043.

This is another case of debug location not getting updated when the
insert point is changed by the `restoreIP`. Fixed by using the wrapper
function that updates the debug location.
2025-08-12 16:20:52 +01:00
Krishna Pandey
8c5e9399f6
[libc][math][c++23] Add f{max,min}imum{,_mag,_mag_num,_num}bf16 math functions (#152881)
This PR adds the following basic math functions for BFloat16 type along
with the tests:
- fmaximumbf16
- fmaximum_magbf16
- fmaximum_mag_numbf16
- fmaximum_numbf16
- fminimumbf16
- fminimum_magbf16
- fminimum_mag_numbf16
- fminimum_numbf16

---------

Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>
2025-08-12 20:37:31 +05:30
Mikhail R. Gadelha
d455d45654
[RISCV][VLOPT] Added support for several vector crypto instructions (#153071)
This PR adds support for the following instructions to the RISC-V
VLOptimizer: vandn.vx, vandn.vv, vbrev.v, vclz.v, vcpop.v, vctz.v,
vror.vi, vror.vx, vror.vv, vrol.vx, vrol.vv.
2025-08-12 12:05:03 -03:00
Sergei Barannikov
2f9f92ad01
[TableGen] Use getValueAsOptionalDef to simplify code (NFC) (#153170) 2025-08-12 17:44:01 +03:00
Philip Reames
d8ce19ae6b Build fix after bbde6b 2025-08-12 07:35:06 -07:00
Tommaso Fellegara
54d0061809
[Utils] update_llc_test_checks.py: updated the regexp for ARM target (#148287)
Fixes #147485.

I changed the regexp for the ARM targets making the part `@+[\t]*@"?(?P=func)"?` optional since when the -asm-verbose=false is passed it is not generated and this led to the issue.
2025-08-12 15:31:07 +01:00
Simon Pilgrim
bd3aa88802
[Headers][X86] Allow SSE MOVD/Q scalar<->vector cvt intrinsics to be used in constexpr (#153192) 2025-08-12 15:29:16 +01:00
Akash Banerjee
4e6d510eb3
[MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (#153048)
Add a new AutomapToTargetData pass. This gathers the declare target
enter variables which have the AUTOMAP modifier. And adds
omp.declare_target_enter/exit mapping directives for fir.alloca and
fir.free oeprations on the AUTOMAP enabled variables.

Automap Ref: OpenMP 6.0 section 7.9.7.
2025-08-12 15:18:15 +01:00
Sergei Barannikov
c69355e7d1
[TableGen] Use getValueInit to reduce code duplication (NFC) (#153167) 2025-08-12 17:18:00 +03:00
Rahul Joshi
89ea9df6a2
[NFCI[TableGen] Minor improvements to Intrinsic::getAttributes (#152761)
This change implements several small improvements to
`Intrinsic::getAttributes`:

1. Use `SequenceToOffsetTable` to emit `ArgAttrIdTable`. This enables
reuse of entries when they share a common prefix. This reduces the size
of this table from 546 to 484 entries, which is 248 bytes.
2. Fix `AttributeComparator` to purely compare argument attributes and
not look at function attributes. This avoids unnecessary duplicates in
the uniqueing process and eliminates 2 entries from
`ArgAttributesInfoTable`, saving 8 bytes.
3. Improve `Intrinsic::getAttributes` code to not initialize all entries
of `AS` always. Currently, we initialize all entries of the array `AS`
even if we may not use all of them. In addition to the runtime cost, for
Clang release builds, since the initialization loop is unrolled, it
consumes ~330 bytes of code to initialize the `AS` array. Address this
by declaring the storage for AS using just a char array with appropriate
`alignas` (similar to how `SmallVectorStorage` defines its inline
elements).
2025-08-12 07:15:08 -07:00
Gao Yanfeng
24f5385a85
[MLIR][NVVM] Support generating all the ldmatrix intrinsics from NVVM ops (#148783)
Previously, the NVVM dialect's ldmatrix operation could only generate a
limited subset of the available NVVM ldmatrix intrinsics. The intrinsics
generating new ops introduced in BlackWell are not accessible through
the NVVM ops. This commit extends the ldmatrix operation to support all
available ldmatrix intrinsics.
2025-08-12 15:13:15 +01:00
Akash Banerjee
e1a694cd16 [NFC] Remove invalid conversions in ComplexToROCDLLibraryCalls 2025-08-12 15:06:03 +01:00
Amit Tiwari
2074e1320f
[Clang][OpenMP] Non-contiguous strided update (#144635)
This patch handles the strided update in the `#pragma omp target update
from(data[a🅱️c])` directive where 'c' represents the strided access
leading to non-contiguous update in the `data` array when the offloaded
execution returns the control back to host from device using the `from`
clause.

Issue: Clang CodeGen where info is generated for the particular
`MapType` (to, from, etc), it was failing to detect the strided access.
Because of this, the `MapType` bits were incorrect when passed to
runtime. This led to incorrect execution (contiguous) in the
libomptarget runtime code.

Added a minimal testcase that verifies the working of the patch.
2025-08-12 19:32:15 +05:30
Simon Pilgrim
72b53cde1c [X86] xop-builtins.c - add C/C++ test coverage 2025-08-12 14:44:10 +01:00
Simon Pilgrim
9442b4ea25 [X86] mmx-builtins.c - use __v8qs initializer instead of _mm_setr_pi8 to correctly run on -fno-signed-char targets 2025-08-12 14:44:10 +01:00
Elizaveta Noskova
bbde6be841
[llvm] Support multiple save/restore points in mir (#119357)
Currently mir supports only one save and one restore point
specification:

```
  savePoint:       '%bb.1'
  restorePoint:    '%bb.2'
```

This patch provide possibility to have multiple save and multiple
restore points in mir:

```
  savePoints:
    - point:           '%bb.1'
  restorePoints:
    - point:           '%bb.2'
```

Shrink-Wrap points split Part 3.
RFC:
https://discourse.llvm.org/t/shrink-wrap-save-restore-points-splitting/83581

Part 1: https://github.com/llvm/llvm-project/pull/117862
Part 2: https://github.com/llvm/llvm-project/pull/119355
Part 4: https://github.com/llvm/llvm-project/pull/119358
Part 5: https://github.com/llvm/llvm-project/pull/119359
2025-08-12 16:34:29 +03:00
Ricardo Jesus
ef5e65d27b
[AArch64] Fix stp kill when merging forward. (#152994)
As an alternative to #149177, iterate through all instructions in
`AArch64LoadStoreOptimizer`.
2025-08-12 14:19:43 +01:00
Akash Banerjee
c1f410779a Revert "[NFC] Remove invalid conversions in ComplexToROCDLLibraryCalls"
This reverts commit b8104fa320f006bacd3e16afb431b5980dd5000a.
2025-08-12 14:18:57 +01:00
Matthias Springer
ef2b8805bf
[mlir][vector] Implement InferTypeOpInterface on vector.to_elements (#153172)
Just for convenience. This auto-generates an additional builder that
infers the result type.
2025-08-12 15:15:30 +02:00
Michał Górny
475921d2dc
[runtimes] Append -nostd*++ flags only for Clang (#151930)
Append `-nostdlib++` and `-nostdinc++` flags to `CMAKE_REQUIRED_FLAGS`
only if we are actually building with Clang.  These flags are also
passed to the C compiler, which is not allowed in GCC.  Since CMake
implicitly performs some tests using the C compiler, this can lead
to incorrect check results.  This should be safe, since FWIU we only
need them when bootstrapping Clang.

Even though we know that Clang supports these flags, we still need
to explicitly check if they work, as in some scenarios adding
`-nostdlib++` actually breaks the build.  See PR #108357 for examples
of that.

Fixes #90332

Signed-off-by: Michał Górny <mgorny@gentoo.org>
2025-08-12 15:14:30 +02:00
Florian Hahn
424258947e
[VPlan] Materialize VF and VFxUF using VPInstructions. (#152879)
Materialize VF and VFxUF computation using VPInstruction
instead of directly creating IR.

This is one of the last few steps needed to model the full vector
skeleton in VPlan.

This is mostly NFC, although in some cases we remove some unused
computations.

PR: https://github.com/llvm/llvm-project/pull/152879
2025-08-12 14:13:13 +01:00
Nikita Popov
9d96d01b42 [IR] Add offset stripping test with mixed const/variable offsets (NFC)
Regression test for:
a7edc95c79 (commitcomment-163691175)
2025-08-12 15:12:16 +02:00
Leon Clark
9115bef8ee
[VectorCombine] Shrink loads used in shufflevector rebroadcasts. (#153138)
Reopen #128938.

Attempt to shrink the size of vector loads where only some of the
incoming lanes are used for rebroadcasts in shufflevector instructions.

---------

Co-authored-by: Leon Clark <leoclark@amd.com>
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-08-12 14:08:37 +01:00
Akash Banerjee
b8104fa320 [NFC] Remove invalid conversions in ComplexToROCDLLibraryCalls 2025-08-12 14:05:00 +01:00
Petar Avramovic
f88be47fbf
AMDGPU/GlobalISel: Switch a few tests to new-reg-bank-select (#153174) 2025-08-12 15:03:31 +02:00
choikwa
1d30f71b21
[AMDGPU] Make ds/global load intrinsics IntrArgMemOnly (#152792)
This along with IntrReadMem means that the Intrinsic only reads memory
through the given argument ptr and its derivatives. This allows passes
like Inliner to attach alias.scope to the call instruction as it sees
that no other memory is accessed.

Discovered via SWDEV-543741

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-08-12 21:51:39 +09:00
David Green
5d099c2831 [AArch64][GlobalISel] Add 128bit insert and extract vector test coverage. NFC 2025-08-12 13:50:36 +01:00
Igor Wodiany
0f346a48a8
[mlir][spirv] Enable serializer to write SPIR-V modules into separate files (#152678)
By default, `mlir-translate` writes all output into a single file even
when `--split-input-file` is used. This is not an issue for text files
as they can be easily split with an output separator. However, this
causes issues with binary SPIR-V modules.

Firstly, a binary file with multiple modules is not a valid SPIR-V, but
will be created if multiple modules are specified in the same file and
separated by "// -----". This does not cause issues with MLIR internal
tools but does not work with SPIRV-Tools.

Secondly, splitting binary files after serialization is non-trivial,
when compared to text files, so using an external tool is not desirable.

This patch adds a SPIR-V serialization option that write SPIR-V modules
to separate files in addition to writing them to the `mlir-translate`
output file. This is not the ideal solution and ideally `mlir-translate`
would allow generating multiple output files when `--split-input-file`
is used, however adding such functionality is again non-trival due to
how processing and splitting is done: output is written to a
single `os` that is passed around, and the number of split buffers is not
known ahead of time. As such a I propose to have a SPIR-V internal
option that will dump modules to files in the form they can be processed
by `spirv-val`. The behaviour of the new added argument may be
confusing, but benefits from being internal to SPIR-V target.

Alternatively, we could expose the spirv option in
`mlir/lib/Tools/mlir-translate/MlirTranslateMain.cpp`, and slice the
output file on the SPIR-V magic number, and not keep the file generated
by default by `mlir-translate`. This would be a bit cleaner in API
sense, as it would not generate the additional file containing all
modules together. However, it pushes SPIR-V specific code into the
generic part of the `mlir-translate` and slicing is potentially more
error prone that just writing a single module after it was serialized.
2025-08-12 13:48:39 +01:00
Orlando Cazalet-Hyams
ba5ff57917
[Dexter] Track DAP capabilities (#152715) 2025-08-12 13:47:33 +01:00
XChy
2a49719525
[SelectionDAGBuilder] Look for appropriate INLINEASM_BR instruction to verify (#152591)
Partially fix #149023.
The original code `MRI.def_begin(Reg)->getParent()` may return the
incorrect MI, as the physical register `Reg` may have multiple
definitions.
This patch selects the correct MI to verify by comparing the MBB of each
definition.

New testcase hangs with -O1/2/3 enabled. The BranchFolding may be to
blame.
2025-08-12 12:37:56 +00:00
Andrei Safronov
48da8489f2
[Xtensa] Add esp32/esp8266 cpus implementation. (#152409)
Add Xtensa esp32 and esp8266 cpus. Implement target parser to recognise
Xtensa hardware features.
2025-08-12 15:17:36 +03:00