llvm-project

Author	SHA1	Message	Date
Pengcheng Wang	17a98f85c2	[RISCV] Optimize the spill/reload of segment registers (#153184 ) The simplest way is: 1. Save `vtype` to a scalar register. 2. Insert a `vsetvli`. 3. Use segment load/store. 4. Restore `vtype` via `vsetvl`. But `vsetvl` is usually slow, so this PR is not in this way. Instead, we use wider whole load/store instructions if the register encoding is aligned. We have done the same optimization for COPY in https://github.com/llvm/llvm-project/pull/84455. We found this suboptimal implementation when porting some video codec kernels via RVV intrinsics.	2025-08-21 16:38:53 +08:00
Ross Brunton	2e74cc6c04	[Offload][NFC] Use a sensible order for APIGen (#154518 ) The order entries in the tablegen API files are iterated is not the order they appear in the file. To avoid any issues with the order changing in future, we now generate all definitions of a certain class before class that can use them. This is a NFC; the definitions don't actually change, just the order they exist in in the OffloadAPI.h header.	2025-08-21 09:38:21 +01:00
Ross Brunton	273ca1f77b	[Offload] Fix `OL_DEVICE_INFO_MAX_MEM_ALLOC_SIZE` on AMD (#154521 ) This wasn't handled with the normal info API, so needs special handling.	2025-08-21 09:37:58 +01:00
Luke Lau	a9692391f6	[RISCV] Move volatile check to isCandidate in VL optimizer. NFC (#154685 ) This keeps it closer to the other legality checks like the FP exceptions check. It also means that isSupportedInstr only needs to check the opcode, which allows it to be replaced with a TSFlags based check in a later patch.	2025-08-21 16:37:10 +08:00
Fraser Cormack	5c411b3c0b	[libclc] Use elementwise ctlz/cttz builtins for CLC clz/ctz (#154535 ) Using the elementwise builtin optimizes the vector case; instead of scalarizing we can compile directly to the vector intrinsics.	2025-08-21 09:32:03 +01:00
Michael Buch	f2aedc21f9	[clang][DebugInfo][test] Move debug-info tests from CodeGenCXX to DebugInfo directory (#154538 ) This patch works towards consolidating all Clang debug-info into the `clang/test/DebugInfo` directory (https://discourse.llvm.org/t/clang-test-location-of-clang-debug-info-tests/87958). Here we move only the `clang/test/CodeGenCXX` tests. I created a `CXX` subdirectory for now because many of the tests I checked actually did seem C++-specific. There is probably overlap between the `Generic` and `CXX` subdirectory, but I haven't gone through and audited them all. The list of files i came up with is: 1. searched for anything with `debug-info` in the filename 2. searched for occurrences of `debug-info-kind` in the tests There's a couple of tests in `clang/test/CodeGenCXX` that still set `-debug-info-kind`. They probably don't need to do that, but I'm not changing that as part of this PR.	2025-08-21 09:26:08 +01:00
Simon Pilgrim	e4b110ab9f	[Headers][X86] Allow FMA3/FMA4 vector intrinsics to be used in constexpr (#154558 ) Now that #152455 is done, we can make all the vector fma intrinsics that wrap __builtin_elementwise_fma to be constexpr Fixes #154555	2025-08-21 09:09:40 +01:00
Benjamin Maxwell	810ea69edd	[LiveRegUnits] Exclude runtime defined liveins when computing liveouts (#154325 ) These liveins are not defined by predecessors, so should not be considered as liveouts in predecessor blocks. This resolves: - https://github.com/llvm/llvm-project/pull/149062#discussion_r2285072001 - https://github.com/llvm/llvm-project/pull/153417#issuecomment-3199972351	2025-08-21 09:06:32 +01:00
Aleksandr Platonov	ff5767a02c	[clangd] Add feature modules registry (#153756 ) This patch adds feature modules registry, as discussed with @kadircet in [discourse](https://discourse.llvm.org/t/rfc-registry-for-feature-modules/87733). Feature modules, which added into the feature module set from registry entries, can't expose public API, but still can be used via `FeatureModule` interface.	2025-08-21 10:30:37 +03:00
Stephan T. Lavavej	f60ff00939	[libcxx][test] Silence nodiscard warnings (#154622 ) MSVC's STL marks `std::make_shared`, `std::allocate_shared`, `std::bitset::to_ulong`, and `std::bitset::to_ullong` as `[[nodiscard]]`, which causes these libcxx tests to emit righteous warnings. They should use the traditional `(void)` cast technique to ignore the return values.	2025-08-21 00:28:17 -07:00
Dominik Adamski	b69fd34e76	[Offload] Add oneInterationPerThread param to loop device RTL (#151959 ) Currently, Flang can generate no-loop kernels for all OpenMP target kernels in the program if the flags -fopenmp-assume-teams-oversubscription or -fopenmp-assume-threads-oversubscription are set. If we add an additional parameter, we can choose in the future which OpenMP kernels should be generated as no-loop kernels. This PR doesn't modify current behavior of oversubscription flags. RFC for no-loop kernels: https://discourse.llvm.org/t/rfc-no-loop-mode-for-openmp-gpu-kernels/87517	2025-08-21 09:03:56 +02:00
Mythreya Kuricheti	0977a6d9e7	[clang][CodeComplete] Consider qualifiers of explicit object parameters in overload suggestions (#154041 ) Fixes https://github.com/llvm/llvm-project/issues/109608	2025-08-21 02:32:41 -04:00
Timm Baeder	e0acf6592b	[clang][bytecode] Call CheckFinalLoad in all language modes (#154496 ) Fixes #153997	2025-08-21 08:24:09 +02:00
Yi Kong	1ff7c8bf0d	[compiler-rt] Fix musl build The change in PR #154268 introduced a dependency on the `__GLIBC_PREREQ` macro, which is not defined in musl libc. This caused the build to fail in environments using musl. This patch fixes the build by including `sanitizer_common/sanitizer_glibc_version.h`. This header provides a fallback definition for `__GLIBC_PREREQ` when LLVM is built against non-glibc C libraries, resolving the compilation error.	2025-08-21 15:19:06 +09:00
Sergei Barannikov	b96d5c2452	[TableGen][DecoderEmitter] Outline InstructionEncoding constructor (NFC) (#154673 ) It is going to grow, so it makes sense to move its definition out of class. Instead, inline `populateInstruction()` into it. Also, rename a couple of methods to better convey their meaning.	2025-08-21 06:08:57 +00:00
Abhishek Kaushik	62aaa96d6f	[SDAG[[X86] Added method to scalarize `STRICT_FSETCC` (#154486 ) Fixes #154485	2025-08-21 11:27:27 +05:30
Carlos Galvez	3baddbbb0a	Do not trigger -Wmissing-noreturn on lambdas prior to C++23 (#154545 ) Fixes #154493 Co-authored-by: Carlos Gálvez <carlos.galvez@zenseact.com>	2025-08-21 07:30:57 +02:00
Jim Lin	3a715107c2	[RISCV] Fold argstr into class for XSMTVDot instructions. NFC. All of them use the same argstr "$vd, $vs1, $vs2".	2025-08-21 13:12:46 +08:00
Craig Topper	2d3d8df0e0	[RISCV] Use RVPTernary_rrr for a few more instructions. This doesn't really affect the assembler, but will be important when we eventually do codegen.	2025-08-20 21:13:40 -07:00
Sergei Barannikov	d6679d5a5f	[Target] Remove SoftFail field on targets that don't use it (NFC) (#154659 ) That is, on all targets except ARM and AArch64. This field used to be required due to a bug, it was fixed long ago by 23423c0ea8d414e56081cb6a13bd8b2cc91513a9.	2025-08-21 05:21:42 +03:00
Jordan Rupprecht	918c0ac762	[bazel] Port #154616 : LDBG in ConvertToLLVMPass (#154661 )	2025-08-20 21:21:14 -05:00
Jordan Rupprecht	90d601d50b	[bazel][LLVMIR] Port #145899 : Add target attrs (#154660 )	2025-08-21 01:59:41 +00:00
Sirui Mu	91569fa030	[CIR][NFC] Use Op::create to create CIR operations in CIRGenBuilder (#154540 )	2025-08-21 09:46:45 +08:00
Aiden Grossman	c811f522f6	[ProfCheck] Add list of xfail tests (#154655 ) This patch contains a list of tests that are currently failing in the LLVM_ENABLE_PROFCHECK=ON build. This enables passing them to lit through the LIT_XFAIL env variable. This is necessary for getting a buildbot spun up to catch regressions while work is being done to fix the existing issues. We need to keep this in the LLVM tree so that tests can be removed from the list at the same time the passes causing issues are fixed. Issue #147390	2025-08-21 01:28:05 +00:00
Matt Arsenault	e414585545	AMDGPU: Add baseline test for mfma rewrite with phi (#153021 )	2025-08-21 10:25:05 +09:00
Matt Arsenault	bcf41e03c7	AMDGPU: Add baseline test for vgpr mfma with copied-from AGPR (#153020 )	2025-08-21 10:24:27 +09:00
Matt Arsenault	eefad7438c	AMDGPU: Handle rewriting VGPR MFMA to AGPR with subregister copies (#153019 ) This should address the case where the result isn't fully used, resulting in partial copy bundles from the MFMA result.	2025-08-21 01:17:03 +00:00
Jim Lin	fd28257195	[DAGCombiner] Fold umax/umin operations with vscale operands (#154461 ) If umax/umin operations with vscale operands, that can be constant folded.	2025-08-21 09:15:40 +08:00
PiJoules	3c8652e737	[compiler-rt][Fuchsia] Change GetMaxUserVirtualAddress to invoke syscall (#153309 ) LSan was recently refactored to call GetMaxUserVirtualAddress for diagnostic purposes. This leads to failures for some of our downstream tests which only run with lsan. This occurs because GetMaxUserVirtualAddress depends on setting up shadow via a call to __sanitizer_shadow_bounds, but shadow bounds aren't set for standalone lsan because it doesn't use shadow. This updates the function to invoke the same syscall used by __sanitizer_shadow_bounds calls for getting the memory limit. Ideally this function would only be called once since we only need to get the bounds once. More context in https://fxbug.dev/437346226.	2025-08-20 18:06:19 -07:00
Craig Topper	8cb6bfe05a	[RISCV] Reduce ManualCodeGen for RVV intrinsics with rounding mode. NFC Operate directly on the existing Ops vector instead of copying to a new vector. This is similar to what the autogenerated codegen does for other intrinsics.	2025-08-20 17:53:46 -07:00
Matt Arsenault	744cd8a9c6	AMDGPU: Add some baseline test for mfma rewrite with subregister copies (#153018 ) Currently only cases rooted at a full copy of an MFMA result are handled. Prepare to relax that by testing more intricate subregister usage. Currently only full copies are handled, add some tests to help work towards handling subregisters.	2025-08-21 00:39:39 +00:00
Matt Arsenault	156f3fce54	AMDGPU: Handle rewriting VGPR MFMAs with immediate src2 (#153016 )	2025-08-21 09:09:24 +09:00
Matt Arsenault	3a0fa12752	DAG: Handle half spanning extract_subvector in type legalization (#154101 ) Previously it would just assert if the extract needed elements from both halves. Extract the individual elements from both halves and create a new vector, as the simplest implementation. This could try to do better and create a partial extract or shuffle (or maybe that's best left for the combiner to figure out later). Fixes secondary issue noticed as part of #153808	2025-08-21 00:05:12 +00:00
Elvis Wang	d611a9ca15	[LV][VPlan] Reduce register usage of VPEVLBasedIVPHIRecipe. (#154482 ) `VPEVLBasedIVPHIRecipe` will lower to VPInstruction scalar phi and generate scalar phi. This recipe will only occupy a scalar register just like other phi recipes. This patch fix the register usage for `VPEVLBasedIVPHIRecipe` from vector to scalar which is close to generated vector IR. https://godbolt.org/z/6Mzd6W6ha shows that no register spills when choosing `<vscale x 16>`. Note that this test is basically copied from AArch64.	2025-08-21 07:39:01 +08:00
Shih-Po Hung	cf0e86118d	[VPlan] Handle canonical VPWidenIntOrFpInduction in branch-condition simplification (#153539 ) SimplifyBranchConditionForVFAndUF only recognized canonical IVs and a few PHI recipes in the loop header. With more IV-step optimizations, the canonical widen-canonical-iv can be replaced by a canonical VPWidenIntOrFpInduction, which the pass did not handle, causing regressions (missed simplifications). This patch replaces canonical VPWidenIntOrFpInduction with a StepVector in the vector preheader since the vector loop region only executes once.	2025-08-21 07:34:54 +08:00
Kazu Hirata	9aae8ef329	[Scalar] Use SmallPtrSet directly instead of SmallSet (NFC) (#154473 ) I'm trying to remove the redirection in SmallSet.h: template <typename PointeeType, unsigned N> class SmallSet<PointeeType, N> : public SmallPtrSet<PointeeType, N> {}; to make it clear that we are using SmallPtrSet. There are only handful places that rely on this redirection. This patch replaces SmallSet to SmallPtrSet where the element type is a pointer.	2025-08-20 16:30:39 -07:00
Kazu Hirata	7be06dbd43	[lldb] Use SmallPtrSet directly instead of SmallSet (NFC) (#154472 ) I'm trying to remove the redirection in SmallSet.h: template <typename PointeeType, unsigned N> class SmallSet<PointeeType, N> : public SmallPtrSet<PointeeType, N> {}; to make it clear that we are using SmallPtrSet. There are only handful places that rely on this redirection. This patch replaces SmallSet to SmallPtrSet where the element type is a pointer.	2025-08-20 16:30:31 -07:00
Kazu Hirata	8a5b6b302e	[flang] Use SmallPtrSet directly instead of SmallSet (NFC) (#154471 ) I'm trying to remove the redirection in SmallSet.h: template <typename PointeeType, unsigned N> class SmallSet<PointeeType, N> : public SmallPtrSet<PointeeType, N> {}; to make it clear that we are using SmallPtrSet. There are only handful places that rely on this redirection. This patch replaces SmallSet to SmallPtrSet where the element type is a pointer.	2025-08-20 16:30:24 -07:00
Min-Yih Hsu	db0eceaa8b	[AMDGPU] Fix uncaught changes made by AMDGPUPreloadKernelArgumentsPass (#154645 ) #153975 added a new test, `test/CodeGen/AMDGPU/disable-preload-kernargs.ll`, that triggers an assertion under `LLVM_ENABLE_EXPENSIVE_CHECKS` complaining about not invalidating analyses even when the Pass made changes. It was caused by the fact that the Pass only invalidates the analyses when number of explicit arguments is greater than zero, while it is possible that some functions will be removed even when there isn't any explicit argument, hence the missed invalidation.	2025-08-20 16:23:23 -07:00
Matt Arsenault	ff5f396dac	AMDGPU: Handle rewriting non-tied MFMA to AGPR form (#153015 ) If src2 and dst aren't the same register, to fold a copy to AGPR into the instruction we also need to reassign src2 to an available AGPR. All the other uses of src2 also need to be compatible with the AGPR replacement in order to avoid inserting other copies somewhere else. Perform this transform, after verifying all other uses are compatible with AGPR, and have an available AGPR available at all points (which effectively means rewriting a full chain of mfmas and load/store at once).	2025-08-21 08:16:56 +09:00
Renaud Kauffmann	3856bb6bbf	[flang] [acc] Adding allocation to the recipe of scalar allocatables (#154643 ) Currently the privatization recipe of a scalar allocatable is as follow: ``` acc.private.recipe @privatization_ref_box_heap_i32 : !fir.ref<!fir.box<!fir.heap<i32>>> init { ^bb0(%arg0: !fir.ref<!fir.box<!fir.heap<i32>>>): %0 = fir.alloca !fir.box<!fir.heap<i32>> %1:2 = hlfir.declare %0 {uniq_name = "acc.private.init"} : (!fir.ref<!fir.box<!fir.heap<i32>>>) -> (!fir.ref<!fir.box<!fir.heap<i32>>>, !fir.ref<!fir.box<!fir.heap<i32>>>) acc.yield %1#0 : !fir.ref<!fir.box<!fir.heap<i32>>> } ``` This change adds the allocation for the scalar.	2025-08-20 16:04:57 -07:00
Mehdi Amini	62b29d9f76	[MLIR] Adopt LDBG() debug macro in BytecodeWriter.cpp (NFC) (#154642 )	2025-08-20 22:45:39 +00:00
Mehdi Amini	908eebcb93	[MLIR] Adopt LDBG() macro in PDL ByteCodeExecutor (NFC) (#154641 )	2025-08-20 22:40:52 +00:00
Ely Ronnen	8b64cd8be2	[lldb-dap] Add module symbol table viewer to VS Code extension #140626 (#153836 ) - VS Code extension: - Add module symbol table viewer using [Tabulator](https://tabulator.info/) for sorting and formatting rows. - Add context menu action to the modules tree. - lldb-dap - Add `DAPGetModuleSymbolsRequest` to get symbols from a module. Fixes #140626 [Screencast From 2025-08-15 19-12-33.webm](https://github.com/user-attachments/assets/75e2f229-ac82-487c-812e-3ea33a575b70)	2025-08-21 00:31:48 +02:00
Joseph Huber	27fc9671f9	Revert "[libc] Enable wide-read memory operations by default on Linux (#154602 )" This reverts commit c80d1483c6d787edf62ff9e86b1e97af5eb5abf9.	2025-08-20 17:27:13 -05:00
Craig Topper	2cb7c46bf0	[RISCV] Add missing 'OrP' to comment in RISCVInstrInfoZb.td. NFC	2025-08-20 15:27:03 -07:00
Joseph Huber	c80d1483c6	[libc] Enable wide-read memory operations by default on Linux (#154602 ) Summary: This patch changes the linux build to use the wide reads on the memory operations by default. These memory functions will now potentially read outside of the bounds explicitly allowed by the current function. While technically undefined behavior in the standard, plenty of C library implementations do this. it will not cause a segmentation fault on linux as long as you do not cross a page boundary, and because we are only reading memory it should not have atomic effects.	2025-08-20 17:17:12 -05:00
Craig Topper	ac8f0bb070	[RISCV] Reduce ManualCodeGen for segment load/store intrinsics. NFC Operate directly on the existing Ops vector instead of copying to a new vector. This is similar to what the autogenerated codegen does for other intrinsics. This reduced the clang binary size by ~96kb on my local Release+Asserts build.	2025-08-20 15:02:24 -07:00
Sergei Barannikov	46343ca374	[TableGen][DecoderEmitter] Add DecoderMethod to InstructionEncoding (NFC) (#154477 ) We used to abuse Operands list to store instruction encoding's DecoderMethod there. Let's store it in the InstructionEncoding class instead, where it belongs.	2025-08-20 21:59:59 +00:00
Mehdi Amini	dbbd3f0d07	[MLIR] Adopt LDBG() macro in Affine/Analysis/Utils.cpp (NFC) (#154626 )	2025-08-20 21:56:03 +00:00

1 2 3 4 5 ...

549412 Commits