llvm-project

Author	SHA1	Message	Date
Krzysztof Parzyszek	4b7f3806f6	[flang][OpenMP] Move rewriting of min/max from Lower to Semantics (#153038 ) There semantic analysis of the ATOMIC construct will require additional rewriting (reassociation of certain expressions for user convenience), and that will be driven by diagnoses made in the semantic checks. While the rewriting of min/max is not required to be done in semantic analysis, moving it there will make all rewriting for ATOMIC construct be located in a single location.	2025-08-12 12:13:50 -05:00
Andy Kaylor	54f53c988d	[CIR] Introduce the CIR global_view attribute (#153044 ) This change introduces the #cir.global_view attribute and adds support for using that attribute to handle initializing a global variable with the address of another global variable. This does not yet include support for the optional list of indices to get an offset from the base address. Those will be added in a follow-up patch.	2025-08-12 10:02:00 -07:00
Andy Kaylor	7f195b36ee	[CIR] Initialize vptr in dynamic classes (#152574 ) This adds support for initializing the vptr member of a dynamic class in the constructor of that class. This does not include support for lowering the `cir.vtable.address_point` operation to the LLVM dialect. That handling will be added in a follow-up patch.	2025-08-12 10:00:38 -07:00
Andy Kaylor	7f22f5bac1	[CIR] Introduce more cleanup infrastructure (#152589 ) Support for normal cleanups was introduced with a simplified implementation compared to what's in the incubator (which corresponds closely to the classic codegen implementation). This change introduces more of the infrastructure that will later be needed to handle non-trivial cleanup cases, including exception handling.	2025-08-12 10:00:13 -07:00
Kane Wang	74fbdbf91f	[RISCV][GISel][NFC] Add MIR legalizer tests for G_UADDE (rv32 & rv64) (#152827 ) Add MIR tests that exercise legalization of the G_UADDE (unsigned add with extend/carry) operation for RISC-V targets.	2025-08-12 09:59:17 -07:00
Farzon Lotfi	544562ebc2	[DirectX] Remove lifetime intrinsics and run Dead Store Elimination (#152636 ) fixes #151764 This fix has two parts first we track all lifetime intrinsics and if they are users of an alloca of a target extention like dx.RawBuffer then we eliminate those memory intrinsics when we visit the alloca. We do step one to allow us to use the Dead Store Elimination Pass. This removes the alloca and simplifies the use of the target extention back to using just the global. That keeps things in a form the DXILBitcodeWriter is expecting. Obviously to pull this off we needed to bring back the legacy pass manager plumbing for the DSE pass and hook it up into the DirectX backend. The net impact of this change is that DML shader pass rate went from 89.72% (4268 successful compilations) to 90.98% (4328 successful compilations).	2025-08-12 12:42:08 -04:00
Thurston Dang	219893297b	[sanitizer] Downgrade TestPTrace() Reports to VReport (#152350 ) Requested in https://github.com/llvm/llvm-project/pull/152072#discussion_r2257892739	2025-08-12 09:37:10 -07:00
Orlando Cazalet-Hyams	54f92c7806	[RemoveDIs][AMDGPU] Replace defunct getAssignmentMarkers call (#153212 ) Not quite NFC as it looks like the original intrinsic-handling code never got updated to use records. This was never caught because that code wasn't tested. I've adjusted an existing test so the behaviour is now covered.	2025-08-12 17:20:38 +01:00
Krishna Pandey	c819c246f3	[libc][math][c++23] Add bf16div{,f,l,f128} math functions (#153191 ) This PR adds the following basic math functions for BFloat16 type along with the tests: - bf16div - bf16divf - bf16divl - bf16divf128 --------- Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>	2025-08-12 21:46:22 +05:30
modiking	38d854c6e8	[MLIR][NVVM] Update MLIR mapa to reflect new address space (#146031 ) The mapa.shared.cluster variant that takes in address-space 3 now should output address-space 7. This patch updates the NVVMOps.td file to reflect this.	2025-08-12 21:43:51 +05:30
Thurston Dang	9a174518a8	[NFCI][msan] Precommit tests for AVX-VNNI (#153135 ) The tests largely cover AVX-VNNI (Vector Neural Network Instructions): - vpdpbusd, vpdpbusds - vpdpwssd, vpdpwssds AVX-VNNI-INT8: - vpdpbssd, vpdpbssds - vpdpbsud, vpdpbsuds - vpdpbuud, vpdpbuuds AVX-VNNI-INT16: - vpdpwsud, vpdpwsuds - vpdpwusd, vpdpwusds - vpdpwuud, vpdpwuuds These instructions are currently heuristically handled (by OR'ing together the vectors). This is incorrect because: 1) multiplication by a zero should result in an initialized value 2) the addition is horizontal (within vectors, not "vertically" between vectors). Future work can improve the instrumentation by applying the updated handleVectorPmaddIntrinsic() from https://github.com/llvm/llvm-project/pull/152941	2025-08-12 09:12:54 -07:00
Thurston Dang	457b14c327	Reapply "[asan] Fix misalignment of variables in fake stack frames" (#153139 ) (#153142 ) This reverts commit 29ad073c6c325dbf92c1aa5a285ca48e55cb918b i.e., relands 927e19f5f3b357823f86f6c4f1378abedccadf27. It was reverted because of buildbot breakages. This reland adds "-pthread" and also moves the test to Posix-only. Original commit message: ASan's instrumentation pass uses `ASanStackFrameLayout::ComputeASanStackFrameLayout()` to calculate the offset of variables, taking into account alignment. However, the fake stack frames returned by the runtime's `GetFrame()` are not guaranteed to be sufficiently aligned (and in some cases, even guaranteed to be misaligned), hence the offset addresses may sometimes be misaligned. This change fixes the misalignment issue by padding the FakeStack. Every fake stack frame is guaranteed to be aligned to the size of the frame. The memory overhead is low: 64KB per FakeStack, compared to the FakeStack size of ~700KB (min) to 11MB (max). Updates the test case from https://github.com/llvm/llvm-project/pull/152889.	2025-08-12 09:11:57 -07:00
Amr Hesham	475aa1b1a1	[CIR] CompoundAssignment from ComplexType to ScalarType (#152915 ) This change adds support for the CompoundAssignment for ComplexType and updates our approach for emitting bin op between Complex & Scalar https://github.com/llvm/llvm-project/issues/141365	2025-08-12 18:01:31 +02:00
Keith Randall	03372c7782	Revert "[libFuzzer] always install signal handler with SA_ONSTACK" (#153114 ) Reverts llvm/llvm-project#147422 Seems to be causing problems with tracebacks. Probably the trackback code doesn't know how to switch back to the regular stack after it gets to the top of the signal stack.	2025-08-12 08:52:58 -07:00
Nathan Gauër	6abbfcae6e	[SPIR-V] Fix OpVectorShuffle undef emission (#151993 ) When an undef/poison value is lowered as a an immediate, it becomes -1. When reaching the backend, the -1 was printed as operand to OpVectorShuffle instead of the proper 0xFFFFFFFF. From the SPIR-V spec: A Component literal may also be FFFFFFFF, which means the corresponding result component has no source and is undefined. The reason the existing tests were passing `spirv-val` was because the binary format was used as output, meaning the `-1` was lowered to `0xFFFFFFFF`. But when the text format is used, `-1` is emitted as-is which is wrong. Fixes #151691	2025-08-12 15:50:48 +00:00
Dan Salvato	b09b05a83e	[M68k] Fix incorrect boolean content type (#152572 ) M68k's SETCC instruction (`scc`) distinctly fills the destination byte with all 1s. If boolean contents are set to `ZeroOrOneBooleanContent`, LLVM can mistakenly think the destination holds `0x01` instead of `0xff` and emit broken code as a result. This change corrects the boolean content type to `ZeroOrNegativeOneBooleanContent`. For example, this IR: ```llvm define dso_local signext range(i8 0, 2) i8 @testBool(i32 noundef %a) local_unnamed_addr #0 { entry: %cmp = icmp eq i32 %a, 4660 %. = zext i1 %cmp to i8 ret i8 %. } ``` would previously build as: ```asm testBool: ; @testBool cmpi.l #4660, (4,%sp) seq %d0 and.l #255, %d0 rts ``` Notice the `zext` is erroneously not clearing the low bits, and thus the register returns with 255 instead of 1. This patch fixes the issue: ```asm testBool: ; @testBool cmpi.l #4660, (4,%sp) seq %d0 and.l #1, %d0 rts ``` Most of the tests containing `scc` suffered from the same value error as described above, so those tests have been updated to match the new output (which also logically corrects them).	2025-08-12 08:46:41 -07:00
Koakuma	111219ed27	[SPARC] Use FMA instructions when we have UA2007 (#148434 )	2025-08-12 22:46:00 +07:00
Benjamin Chetioui	6f3b3604bc	Revert "[ADT] Simplify getFirstEl (NFC)" (#153201 ) Reverts llvm/llvm-project#153127 This broke ubsan: https://lab.llvm.org/buildbot/#/builders/25/builds/10649.	2025-08-12 15:41:49 +00:00
moorabbit	f8653cecd1	[Clang][X86] Replace F16C `vcvtph2ps/256` intrinsics with `(convert\|shuffle)vector` builtins (#152911 ) The following intrinsics were replaced by a combination of `__builtin_shufflevector` and `__builtin_convertvector`: - `__builtin_ia32_vcvtph2ps` - `__builtin_ia32_vcvtph2ps256` Fixes #152749	2025-08-12 16:32:19 +01:00
Abid Qadeer	62d0b712b7	[OMPIRBuilder] Avoid invalid debug location. (#153190 ) Fixes #153043. This is another case of debug location not getting updated when the insert point is changed by the `restoreIP`. Fixed by using the wrapper function that updates the debug location.	2025-08-12 16:20:52 +01:00
Krishna Pandey	8c5e9399f6	[libc][math][c++23] Add f{max,min}imum{,_mag,_mag_num,_num}bf16 math functions (#152881 ) This PR adds the following basic math functions for BFloat16 type along with the tests: - fmaximumbf16 - fmaximum_magbf16 - fmaximum_mag_numbf16 - fmaximum_numbf16 - fminimumbf16 - fminimum_magbf16 - fminimum_mag_numbf16 - fminimum_numbf16 --------- Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>	2025-08-12 20:37:31 +05:30
Mikhail R. Gadelha	d455d45654	[RISCV][VLOPT] Added support for several vector crypto instructions (#153071 ) This PR adds support for the following instructions to the RISC-V VLOptimizer: vandn.vx, vandn.vv, vbrev.v, vclz.v, vcpop.v, vctz.v, vror.vi, vror.vx, vror.vv, vrol.vx, vrol.vv.	2025-08-12 12:05:03 -03:00
Sergei Barannikov	2f9f92ad01	[TableGen] Use getValueAsOptionalDef to simplify code (NFC) (#153170 )	2025-08-12 17:44:01 +03:00
Philip Reames	d8ce19ae6b	Build fix after bbde6b	2025-08-12 07:35:06 -07:00
Tommaso Fellegara	54d0061809	[Utils] update_llc_test_checks.py: updated the regexp for ARM target (#148287 ) Fixes #147485. I changed the regexp for the ARM targets making the part `@+[\t]*@"?(?P=func)"?` optional since when the -asm-verbose=false is passed it is not generated and this led to the issue.	2025-08-12 15:31:07 +01:00
Simon Pilgrim	bd3aa88802	[Headers][X86] Allow SSE MOVD/Q scalar<->vector cvt intrinsics to be used in constexpr (#153192 )	2025-08-12 15:29:16 +01:00
Akash Banerjee	4e6d510eb3	[MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (#153048 ) Add a new AutomapToTargetData pass. This gathers the declare target enter variables which have the AUTOMAP modifier. And adds omp.declare_target_enter/exit mapping directives for fir.alloca and fir.free oeprations on the AUTOMAP enabled variables. Automap Ref: OpenMP 6.0 section 7.9.7.	2025-08-12 15:18:15 +01:00
Sergei Barannikov	c69355e7d1	[TableGen] Use `getValueInit` to reduce code duplication (NFC) (#153167 )	2025-08-12 17:18:00 +03:00
Rahul Joshi	89ea9df6a2	[NFCI[TableGen] Minor improvements to `Intrinsic::getAttributes` (#152761 ) This change implements several small improvements to `Intrinsic::getAttributes`: 1. Use `SequenceToOffsetTable` to emit `ArgAttrIdTable`. This enables reuse of entries when they share a common prefix. This reduces the size of this table from 546 to 484 entries, which is 248 bytes. 2. Fix `AttributeComparator` to purely compare argument attributes and not look at function attributes. This avoids unnecessary duplicates in the uniqueing process and eliminates 2 entries from `ArgAttributesInfoTable`, saving 8 bytes. 3. Improve `Intrinsic::getAttributes` code to not initialize all entries of `AS` always. Currently, we initialize all entries of the array `AS` even if we may not use all of them. In addition to the runtime cost, for Clang release builds, since the initialization loop is unrolled, it consumes ~330 bytes of code to initialize the `AS` array. Address this by declaring the storage for AS using just a char array with appropriate `alignas` (similar to how `SmallVectorStorage` defines its inline elements).	2025-08-12 07:15:08 -07:00
Gao Yanfeng	24f5385a85	[MLIR][NVVM] Support generating all the ldmatrix intrinsics from NVVM ops (#148783 ) Previously, the NVVM dialect's ldmatrix operation could only generate a limited subset of the available NVVM ldmatrix intrinsics. The intrinsics generating new ops introduced in BlackWell are not accessible through the NVVM ops. This commit extends the ldmatrix operation to support all available ldmatrix intrinsics.	2025-08-12 15:13:15 +01:00
Akash Banerjee	e1a694cd16	[NFC] Remove invalid conversions in ComplexToROCDLLibraryCalls	2025-08-12 15:06:03 +01:00
Amit Tiwari	2074e1320f	[Clang][OpenMP] Non-contiguous strided update (#144635 ) This patch handles the strided update in the `#pragma omp target update from(data[a🅱️c])` directive where 'c' represents the strided access leading to non-contiguous update in the `data` array when the offloaded execution returns the control back to host from device using the `from` clause. Issue: Clang CodeGen where info is generated for the particular `MapType` (to, from, etc), it was failing to detect the strided access. Because of this, the `MapType` bits were incorrect when passed to runtime. This led to incorrect execution (contiguous) in the libomptarget runtime code. Added a minimal testcase that verifies the working of the patch.	2025-08-12 19:32:15 +05:30
Simon Pilgrim	72b53cde1c	[X86] xop-builtins.c - add C/C++ test coverage	2025-08-12 14:44:10 +01:00
Simon Pilgrim	9442b4ea25	[X86] mmx-builtins.c - use __v8qs initializer instead of _mm_setr_pi8 to correctly run on -fno-signed-char targets	2025-08-12 14:44:10 +01:00
Elizaveta Noskova	bbde6be841	[llvm] Support multiple save/restore points in mir (#119357 ) Currently mir supports only one save and one restore point specification: ``` savePoint: '%bb.1' restorePoint: '%bb.2' ``` This patch provide possibility to have multiple save and multiple restore points in mir: ``` savePoints: - point: '%bb.1' restorePoints: - point: '%bb.2' ``` Shrink-Wrap points split Part 3. RFC: https://discourse.llvm.org/t/shrink-wrap-save-restore-points-splitting/83581 Part 1: https://github.com/llvm/llvm-project/pull/117862 Part 2: https://github.com/llvm/llvm-project/pull/119355 Part 4: https://github.com/llvm/llvm-project/pull/119358 Part 5: https://github.com/llvm/llvm-project/pull/119359	2025-08-12 16:34:29 +03:00
Ricardo Jesus	ef5e65d27b	[AArch64] Fix stp kill when merging forward. (#152994 ) As an alternative to #149177, iterate through all instructions in `AArch64LoadStoreOptimizer`.	2025-08-12 14:19:43 +01:00
Akash Banerjee	c1f410779a	Revert "[NFC] Remove invalid conversions in ComplexToROCDLLibraryCalls" This reverts commit b8104fa320f006bacd3e16afb431b5980dd5000a.	2025-08-12 14:18:57 +01:00
Matthias Springer	ef2b8805bf	[mlir][vector] Implement `InferTypeOpInterface` on `vector.to_elements` (#153172 ) Just for convenience. This auto-generates an additional builder that infers the result type.	2025-08-12 15:15:30 +02:00
Michał Górny	475921d2dc	[runtimes] Append `-nostd*++` flags only for Clang (#151930 ) Append `-nostdlib++` and `-nostdinc++` flags to `CMAKE_REQUIRED_FLAGS` only if we are actually building with Clang. These flags are also passed to the C compiler, which is not allowed in GCC. Since CMake implicitly performs some tests using the C compiler, this can lead to incorrect check results. This should be safe, since FWIU we only need them when bootstrapping Clang. Even though we know that Clang supports these flags, we still need to explicitly check if they work, as in some scenarios adding `-nostdlib++` actually breaks the build. See PR #108357 for examples of that. Fixes #90332 Signed-off-by: Michał Górny <mgorny@gentoo.org>	2025-08-12 15:14:30 +02:00
Florian Hahn	424258947e	[VPlan] Materialize VF and VFxUF using VPInstructions. (#152879 ) Materialize VF and VFxUF computation using VPInstruction instead of directly creating IR. This is one of the last few steps needed to model the full vector skeleton in VPlan. This is mostly NFC, although in some cases we remove some unused computations. PR: https://github.com/llvm/llvm-project/pull/152879	2025-08-12 14:13:13 +01:00
Nikita Popov	9d96d01b42	[IR] Add offset stripping test with mixed const/variable offsets (NFC) Regression test for: `a7edc95c79 (commitcomment-163691175)`	2025-08-12 15:12:16 +02:00
Leon Clark	9115bef8ee	[VectorCombine] Shrink loads used in shufflevector rebroadcasts. (#153138 ) Reopen #128938. Attempt to shrink the size of vector loads where only some of the incoming lanes are used for rebroadcasts in shufflevector instructions. --------- Co-authored-by: Leon Clark <leoclark@amd.com> Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>	2025-08-12 14:08:37 +01:00
Akash Banerjee	b8104fa320	[NFC] Remove invalid conversions in ComplexToROCDLLibraryCalls	2025-08-12 14:05:00 +01:00
Petar Avramovic	f88be47fbf	AMDGPU/GlobalISel: Switch a few tests to new-reg-bank-select (#153174 )	2025-08-12 15:03:31 +02:00
choikwa	1d30f71b21	[AMDGPU] Make ds/global load intrinsics IntrArgMemOnly (#152792 ) This along with IntrReadMem means that the Intrinsic only reads memory through the given argument ptr and its derivatives. This allows passes like Inliner to attach alias.scope to the call instruction as it sees that no other memory is accessed. Discovered via SWDEV-543741 --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-08-12 21:51:39 +09:00
David Green	5d099c2831	[AArch64][GlobalISel] Add 128bit insert and extract vector test coverage. NFC	2025-08-12 13:50:36 +01:00
Igor Wodiany	0f346a48a8	[mlir][spirv] Enable serializer to write SPIR-V modules into separate files (#152678 ) By default, `mlir-translate` writes all output into a single file even when `--split-input-file` is used. This is not an issue for text files as they can be easily split with an output separator. However, this causes issues with binary SPIR-V modules. Firstly, a binary file with multiple modules is not a valid SPIR-V, but will be created if multiple modules are specified in the same file and separated by "// -----". This does not cause issues with MLIR internal tools but does not work with SPIRV-Tools. Secondly, splitting binary files after serialization is non-trivial, when compared to text files, so using an external tool is not desirable. This patch adds a SPIR-V serialization option that write SPIR-V modules to separate files in addition to writing them to the `mlir-translate` output file. This is not the ideal solution and ideally `mlir-translate` would allow generating multiple output files when `--split-input-file` is used, however adding such functionality is again non-trival due to how processing and splitting is done: output is written to a single `os` that is passed around, and the number of split buffers is not known ahead of time. As such a I propose to have a SPIR-V internal option that will dump modules to files in the form they can be processed by `spirv-val`. The behaviour of the new added argument may be confusing, but benefits from being internal to SPIR-V target. Alternatively, we could expose the spirv option in `mlir/lib/Tools/mlir-translate/MlirTranslateMain.cpp`, and slice the output file on the SPIR-V magic number, and not keep the file generated by default by `mlir-translate`. This would be a bit cleaner in API sense, as it would not generate the additional file containing all modules together. However, it pushes SPIR-V specific code into the generic part of the `mlir-translate` and slicing is potentially more error prone that just writing a single module after it was serialized.	2025-08-12 13:48:39 +01:00
Orlando Cazalet-Hyams	ba5ff57917	[Dexter] Track DAP capabilities (#152715 )	2025-08-12 13:47:33 +01:00
XChy	2a49719525	[SelectionDAGBuilder] Look for appropriate INLINEASM_BR instruction to verify (#152591 ) Partially fix #149023. The original code `MRI.def_begin(Reg)->getParent()` may return the incorrect MI, as the physical register `Reg` may have multiple definitions. This patch selects the correct MI to verify by comparing the MBB of each definition. New testcase hangs with -O1/2/3 enabled. The BranchFolding may be to blame.	2025-08-12 12:37:56 +00:00
Andrei Safronov	48da8489f2	[Xtensa] Add esp32/esp8266 cpus implementation. (#152409 ) Add Xtensa esp32 and esp8266 cpus. Implement target parser to recognise Xtensa hardware features.	2025-08-12 15:17:36 +03:00

1 2 3 4 5 ...

548254 Commits