549622 Commits

Author SHA1 Message Date
David Green
5db67e1c86
[GlobalISel] Add a fadd 0.0 combine with nsz (#153748)
This is surprisingly helpful, coming up a lot from fadd reductions.
2025-08-21 10:19:39 +01:00
Peter Smith
bcf09c1bc7
[ARM][Disassembler] Advance IT State when instruction is unknown (#154531)
When an instruction that the disassembler does not recognize is in an IT
block, we should still advance the IT state otherwise the IT state
spills over into the next recognized instruction, which is incorrect.

We want to avoid disassembly like:
it eq
<unknown> // Often because disassembler has insufficient target info. 
addeq r0,r0,r0 // eq spills over into add.

Fixes #150569
2025-08-21 10:14:30 +01:00
Ilya Biryukov
26d4e56be8 [Clang] Fix dedup-types-builtin.cpp test when -std=c++20
It was previously failing because of a warning marking a C++20 feature
as an extension.

This is a follow-up to 85043c1c146fd5658ad4c5b5138e58994333e645 that
introduced the test.
2025-08-21 11:05:02 +02:00
Benjamin Maxwell
bfab8085af
[libunwind] Add support for the AArch64 "Vector Granule" (VG) register (#153565)
The vector granule (AArch64 DWARF register 46) is a pseudo-register that
contains the available size in bits of SVE vector registers in the
current call frame, divided by 64. The vector granule can be used in
DWARF expressions to describe SVE/SME stack frame layouts (e.g., the
location of SVE callee-saves).

The first time VG is evaluated (if not already set), it is initialized
to the result of evaluating a "CNTD" instruction (this assumes SVE is
available).

To support SME, the value of VG can change per call frame; this is
currently handled like any other callee-save and is intended to support
the unwind information implemented in #152283. This limits how VG is
used in the CFI information of functions with "streaming-mode changes"
(mode changes that change the SVE vector length), to make the unwinder's
job easier.
2025-08-21 10:01:40 +01:00
Lang Hames
9039b591d0 [orc-rt] Add rtti header and unit tests.
The orc-rt extensible RTTI mechanism is used to provide simple dynamic RTTI
checks for orc-rt types that do not depend on standard C++ RTTI (meaning that
they will work equally well for programs compiled with -fno-rtti).
2025-08-21 18:59:46 +10:00
Lang Hames
ca0a8d99bb [orc-rt] Add bitmask-enum helper utilities.
ORC_RT_MARK_AS_BITMASK_ENUM and ORC_RT_DECLARE_ENUM_AS_BITMASK can be used to
easily add support for bitmask operators (&, |, ^, ~) to enum types.

This code was derived from LLVM's include/llvm/ADT/BitmaskEnum.h header.
2025-08-21 18:59:46 +10:00
Lang Hames
2380d0ad1d [orc-rt] Add preliminary math.h header and basic operations.
The initial operations, isPowerOf2 and nextPowerOf2 will be used in an upcoming
patch to add support for bitmask-enums.
2025-08-21 18:59:46 +10:00
Lang Hames
8a10fbb2cb [orc-rt] Expand span.h file comment. NFC. 2025-08-21 18:59:46 +10:00
Abhishek Kaushik
f9c20ba040
[X86] Rename fp80-strict-vec-cmp.ll to scalarize-strict-fsetcc.ll (#154688)
The test name in #154486 mentioned fp80 which we are not testing in the
tests.
2025-08-21 14:28:39 +05:30
David Green
d9d71bdc14
[AArch64] Move BSL generation to lowering. (#151855)
It is generally better to allow the target independent combines before
creating AArch64 specific nodes (providing they don't mess it up). This
moves the generation of BSL nodes to lowering, not a combine, so that
intermediate nodes are more likely to be optimized. There is a small
change in the constant handling to detect legalized buildvector
arguments correctly.

Fixes #149380 but not directly. #151856 contained a direct fix for
expanding the pseudos.
2025-08-21 09:54:42 +01:00
Pengcheng Wang
17a98f85c2
[RISCV] Optimize the spill/reload of segment registers (#153184)
The simplest way is:

1. Save `vtype` to a scalar register.
2. Insert a `vsetvli`.
3. Use segment load/store.
4. Restore `vtype` via `vsetvl`.

But `vsetvl` is usually slow, so this PR is not in this way.

Instead, we use wider whole load/store instructions if the register
encoding is aligned. We have done the same optimization for COPY in
https://github.com/llvm/llvm-project/pull/84455.

We found this suboptimal implementation when porting some video codec
kernels via RVV intrinsics.
2025-08-21 16:38:53 +08:00
Ross Brunton
2e74cc6c04
[Offload][NFC] Use a sensible order for APIGen (#154518)
The order entries in the tablegen API files are iterated is not the
order
they appear in the file. To avoid any issues with the order changing
in future, we now generate all definitions of a certain class before
class that can use them.

This is a NFC; the definitions don't actually change, just the order
they exist in in the OffloadAPI.h header.
2025-08-21 09:38:21 +01:00
Ross Brunton
273ca1f77b
[Offload] Fix OL_DEVICE_INFO_MAX_MEM_ALLOC_SIZE on AMD (#154521)
This wasn't handled with the normal info API, so needs special handling.
2025-08-21 09:37:58 +01:00
Luke Lau
a9692391f6
[RISCV] Move volatile check to isCandidate in VL optimizer. NFC (#154685)
This keeps it closer to the other legality checks like the FP exceptions
check.
It also means that isSupportedInstr only needs to check the opcode,
which allows it to be replaced with a TSFlags based check in a later
patch.
2025-08-21 16:37:10 +08:00
Fraser Cormack
5c411b3c0b
[libclc] Use elementwise ctlz/cttz builtins for CLC clz/ctz (#154535)
Using the elementwise builtin optimizes the vector case; instead of
scalarizing we can compile directly to the vector intrinsics.
2025-08-21 09:32:03 +01:00
Michael Buch
f2aedc21f9
[clang][DebugInfo][test] Move debug-info tests from CodeGenCXX to DebugInfo directory (#154538)
This patch works towards consolidating all Clang debug-info into the
`clang/test/DebugInfo` directory
(https://discourse.llvm.org/t/clang-test-location-of-clang-debug-info-tests/87958).

Here we move only the `clang/test/CodeGenCXX` tests. I created a `CXX`
subdirectory for now because many of the tests I checked actually did
seem C++-specific. There is probably overlap between the `Generic` and
`CXX` subdirectory, but I haven't gone through and audited them all.

The list of files i came up with is:
1. searched for anything with `*debug-info*` in the filename
2. searched for occurrences of `debug-info-kind` in the tests

There's a couple of tests in `clang/test/CodeGenCXX` that still set
`-debug-info-kind`. They probably don't need to do that, but I'm not
changing that as part of this PR.
2025-08-21 09:26:08 +01:00
Simon Pilgrim
e4b110ab9f
[Headers][X86] Allow FMA3/FMA4 vector intrinsics to be used in constexpr (#154558)
Now that #152455 is done, we can make all the vector fma intrinsics that wrap __builtin_elementwise_fma to be constexpr

Fixes #154555
2025-08-21 09:09:40 +01:00
Benjamin Maxwell
810ea69edd
[LiveRegUnits] Exclude runtime defined liveins when computing liveouts (#154325)
These liveins are not defined by predecessors, so should not be 
considered as liveouts in predecessor blocks. This resolves:

- https://github.com/llvm/llvm-project/pull/149062#discussion_r2285072001
- https://github.com/llvm/llvm-project/pull/153417#issuecomment-3199972351
2025-08-21 09:06:32 +01:00
Aleksandr Platonov
ff5767a02c
[clangd] Add feature modules registry (#153756)
This patch adds feature modules registry, as discussed with @kadircet in
[discourse](https://discourse.llvm.org/t/rfc-registry-for-feature-modules/87733).
Feature modules, which added into the feature module set from registry
entries, can't expose public API, but still can be used via
`FeatureModule` interface.
2025-08-21 10:30:37 +03:00
Stephan T. Lavavej
f60ff00939
[libcxx][test] Silence nodiscard warnings (#154622)
MSVC's STL marks `std::make_shared`, `std::allocate_shared`,
`std::bitset::to_ulong`, and `std::bitset::to_ullong` as
`[[nodiscard]]`, which causes these libcxx tests to emit righteous
warnings. They should use the traditional `(void)` cast technique to
ignore the return values.
2025-08-21 00:28:17 -07:00
Dominik Adamski
b69fd34e76
[Offload] Add oneInterationPerThread param to loop device RTL (#151959)
Currently, Flang can generate no-loop kernels for all OpenMP target
kernels in the program if the flags
-fopenmp-assume-teams-oversubscription or
-fopenmp-assume-threads-oversubscription are set.
If we add an additional parameter, we can choose
in the future which OpenMP kernels should be generated as no-loop
kernels.

This PR doesn't modify current behavior of oversubscription flags.

RFC for no-loop kernels:
https://discourse.llvm.org/t/rfc-no-loop-mode-for-openmp-gpu-kernels/87517
2025-08-21 09:03:56 +02:00
Mythreya Kuricheti
0977a6d9e7
[clang][CodeComplete] Consider qualifiers of explicit object parameters in overload suggestions (#154041)
Fixes https://github.com/llvm/llvm-project/issues/109608
2025-08-21 02:32:41 -04:00
Timm Baeder
e0acf6592b
[clang][bytecode] Call CheckFinalLoad in all language modes (#154496)
Fixes #153997
2025-08-21 08:24:09 +02:00
Yi Kong
1ff7c8bf0d [compiler-rt] Fix musl build
The change in PR #154268 introduced a dependency on the `__GLIBC_PREREQ`
macro, which is not defined in musl libc. This caused the build to fail
in environments using musl.

This patch fixes the build by including
`sanitizer_common/sanitizer_glibc_version.h`. This header provides a
fallback definition for `__GLIBC_PREREQ` when LLVM is built against
non-glibc C libraries, resolving the compilation error.
2025-08-21 15:19:06 +09:00
Sergei Barannikov
b96d5c2452
[TableGen][DecoderEmitter] Outline InstructionEncoding constructor (NFC) (#154673)
It is going to grow, so it makes sense to move its definition
out of class. Instead, inline `populateInstruction()` into it.
Also, rename a couple of methods to better convey their meaning.
2025-08-21 06:08:57 +00:00
Abhishek Kaushik
62aaa96d6f
[SDAG[[X86] Added method to scalarize STRICT_FSETCC (#154486)
Fixes #154485
2025-08-21 11:27:27 +05:30
Carlos Galvez
3baddbbb0a
Do not trigger -Wmissing-noreturn on lambdas prior to C++23 (#154545)
Fixes #154493

Co-authored-by: Carlos Gálvez <carlos.galvez@zenseact.com>
2025-08-21 07:30:57 +02:00
Jim Lin
3a715107c2 [RISCV] Fold argstr into class for XSMTVDot instructions. NFC.
All of them use the same argstr "$vd, $vs1, $vs2".
2025-08-21 13:12:46 +08:00
Craig Topper
2d3d8df0e0 [RISCV] Use RVPTernary_rrr for a few more instructions.
This doesn't really affect the assembler, but will
be important when we eventually do codegen.
2025-08-20 21:13:40 -07:00
Sergei Barannikov
d6679d5a5f
[Target] Remove SoftFail field on targets that don't use it (NFC) (#154659)
That is, on all targets except ARM and AArch64.
This field used to be required due to a bug, it was fixed long ago
by 23423c0ea8d414e56081cb6a13bd8b2cc91513a9.
2025-08-21 05:21:42 +03:00
Jordan Rupprecht
918c0ac762
[bazel] Port #154616: LDBG in ConvertToLLVMPass (#154661) 2025-08-20 21:21:14 -05:00
Jordan Rupprecht
90d601d50b
[bazel][LLVMIR] Port #145899: Add target attrs (#154660) 2025-08-21 01:59:41 +00:00
Sirui Mu
91569fa030
[CIR][NFC] Use Op::create to create CIR operations in CIRGenBuilder (#154540) 2025-08-21 09:46:45 +08:00
Aiden Grossman
c811f522f6
[ProfCheck] Add list of xfail tests (#154655)
This patch contains a list of tests that are currently failing in the
LLVM_ENABLE_PROFCHECK=ON build. This enables passing them to lit through
the LIT_XFAIL env variable. This is necessary for getting a buildbot
spun up to catch regressions while work is being done to fix the
existing issues.

We need to keep this in the LLVM tree so that tests can be removed from
the list at the same time the passes causing issues are fixed.

Issue #147390
2025-08-21 01:28:05 +00:00
Matt Arsenault
e414585545
AMDGPU: Add baseline test for mfma rewrite with phi (#153021) 2025-08-21 10:25:05 +09:00
Matt Arsenault
bcf41e03c7
AMDGPU: Add baseline test for vgpr mfma with copied-from AGPR (#153020) 2025-08-21 10:24:27 +09:00
Matt Arsenault
eefad7438c
AMDGPU: Handle rewriting VGPR MFMA to AGPR with subregister copies (#153019)
This should address the case where the result isn't fully used,
resulting in partial copy bundles from the MFMA result.
2025-08-21 01:17:03 +00:00
Jim Lin
fd28257195
[DAGCombiner] Fold umax/umin operations with vscale operands (#154461)
If umax/umin operations with vscale operands, that can be constant
folded.
2025-08-21 09:15:40 +08:00
PiJoules
3c8652e737
[compiler-rt][Fuchsia] Change GetMaxUserVirtualAddress to invoke syscall (#153309)
LSan was recently refactored to call GetMaxUserVirtualAddress for
diagnostic purposes. This leads to failures for some of our downstream
tests which only run with lsan. This occurs because
GetMaxUserVirtualAddress depends on setting up shadow via a call to
__sanitizer_shadow_bounds, but shadow bounds aren't set for standalone
lsan because it doesn't use shadow. This updates the function to invoke
the same syscall used by __sanitizer_shadow_bounds calls for getting the
memory limit. Ideally this function would only be called once since we
only need to get the bounds once.

More context in https://fxbug.dev/437346226.
2025-08-20 18:06:19 -07:00
Craig Topper
8cb6bfe05a [RISCV] Reduce ManualCodeGen for RVV intrinsics with rounding mode. NFC
Operate directly on the existing Ops vector instead of copying to
a new vector. This is similar to what the autogenerated codegen
does for other intrinsics.
2025-08-20 17:53:46 -07:00
Matt Arsenault
744cd8a9c6
AMDGPU: Add some baseline test for mfma rewrite with subregister copies (#153018)
Currently only cases rooted at a full copy of an MFMA result are
handled.
Prepare to relax that by testing more intricate subregister usage.

Currently only full copies are handled, add some tests to help work
towards handling subregisters.
2025-08-21 00:39:39 +00:00
Matt Arsenault
156f3fce54
AMDGPU: Handle rewriting VGPR MFMAs with immediate src2 (#153016) 2025-08-21 09:09:24 +09:00
Matt Arsenault
3a0fa12752
DAG: Handle half spanning extract_subvector in type legalization (#154101)
Previously it would just assert if the extract needed elements from
both halves. Extract the individual elements from both halves and
create a new vector, as the simplest implementation. This could
try to do better and create a partial extract or shuffle (or
maybe that's best left for the combiner to figure out later).

Fixes secondary issue noticed as part of #153808
2025-08-21 00:05:12 +00:00
Elvis Wang
d611a9ca15
[LV][VPlan] Reduce register usage of VPEVLBasedIVPHIRecipe. (#154482)
`VPEVLBasedIVPHIRecipe` will lower to VPInstruction scalar phi and
generate scalar phi. This recipe will only occupy a scalar register just
like other phi recipes.

This patch fix the register usage for `VPEVLBasedIVPHIRecipe` from
vector
to scalar which is close to generated vector IR.

https://godbolt.org/z/6Mzd6W6ha shows that no register spills when
choosing `<vscale x 16>`.

Note that this test is basically copied from AArch64.
2025-08-21 07:39:01 +08:00
Shih-Po Hung
cf0e86118d
[VPlan] Handle canonical VPWidenIntOrFpInduction in branch-condition simplification (#153539)
SimplifyBranchConditionForVFAndUF only recognized canonical IVs and a
few PHI
recipes in the loop header. With more IV-step optimizations,
the canonical widen-canonical-iv can be replaced by a canonical
VPWidenIntOrFpInduction,
which the pass did not handle, causing regressions (missed
simplifications).

This patch replaces canonical VPWidenIntOrFpInduction with a StepVector
in the vector preheader
since the vector loop region only executes once.
2025-08-21 07:34:54 +08:00
Kazu Hirata
9aae8ef329
[Scalar] Use SmallPtrSet directly instead of SmallSet (NFC) (#154473)
I'm trying to remove the redirection in SmallSet.h:

template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};

to make it clear that we are using SmallPtrSet.  There are only
handful places that rely on this redirection.

This patch replaces SmallSet to SmallPtrSet where the element type is
a pointer.
2025-08-20 16:30:39 -07:00
Kazu Hirata
7be06dbd43
[lldb] Use SmallPtrSet directly instead of SmallSet (NFC) (#154472)
I'm trying to remove the redirection in SmallSet.h:

template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};

to make it clear that we are using SmallPtrSet.  There are only
handful places that rely on this redirection.

This patch replaces SmallSet to SmallPtrSet where the element type is
a pointer.
2025-08-20 16:30:31 -07:00
Kazu Hirata
8a5b6b302e
[flang] Use SmallPtrSet directly instead of SmallSet (NFC) (#154471)
I'm trying to remove the redirection in SmallSet.h:

template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};

to make it clear that we are using SmallPtrSet.  There are only
handful places that rely on this redirection.

This patch replaces SmallSet to SmallPtrSet where the element type is
a pointer.
2025-08-20 16:30:24 -07:00
Min-Yih Hsu
db0eceaa8b
[AMDGPU] Fix uncaught changes made by AMDGPUPreloadKernelArgumentsPass (#154645)
#153975 added a new test,
`test/CodeGen/AMDGPU/disable-preload-kernargs.ll`, that triggers an
assertion under `LLVM_ENABLE_EXPENSIVE_CHECKS` complaining about not
invalidating analyses even when the Pass made changes. It was caused by
the fact that the Pass only invalidates the analyses when number of
explicit arguments is greater than zero, while it is possible that some
functions will be removed even when there isn't any explicit argument,
hence the missed invalidation.
2025-08-20 16:23:23 -07:00
Matt Arsenault
ff5f396dac
AMDGPU: Handle rewriting non-tied MFMA to AGPR form (#153015)
If src2 and dst aren't the same register, to fold a copy
to AGPR into the instruction we also need to reassign src2
to an available AGPR. All the other uses of src2 also need
to be compatible with the AGPR replacement in order to avoid
inserting other copies somewhere else.

Perform this transform, after verifying all other uses are
compatible with AGPR, and have an available AGPR available at
all points (which effectively means rewriting a full chain of
mfmas and load/store at once).
2025-08-21 08:16:56 +09:00