llvm-project

Author	SHA1	Message	Date
neonetizen	e11a31f4c7	[CIR][AArch64] Lower FP16 vduph lane intrinsics (#186955 ) From #185382 Lower `vduph_lane_f16` and `vduph_laneq_f16` to `cir::VecExtractOp` Tests moved from `v8.2a-neon-instrinsics-generic.c` to a new CIR-enabled test file. I tried following from notes made in #185852 (BF16)	2026-04-06 19:12:34 +01:00
SiliconA-Z	5c13d2f099	[ARM] Enable creation of ARMISD::CMN nodes (#163223 ) Map ARMISD::CMN to tCMN instead of armcmpz. Rename the cmn instructions to match this new reality. Please note that I do not have merge permissions.	2026-04-06 20:05:14 +02:00
Craig Topper	38034d42bd	[RISCV] Use EVT instead of MVT in compressShuffleOfShuffles. (#190636 ) For the test case I just grabbed a test that exercised this code path and made the VT non-simple. Fixes #190605.	2026-04-06 11:03:38 -07:00
Chinmay Deshpande	12e957fd7f	[AMDGPU][GISel] RegBankLegalize rules for amdgcn_inverse_ballot (#190629 )	2026-04-06 10:30:35 -07:00
Tomer Shafir	37801e9e99	[MCA] Enhance debug prints of processor resources (#190132 ) Previously, `computeProcResourceMasks()` would print resource masks on debug mode from multiple call sites, creating noise in the debug output. This patch aims to fix this and also print more info about the resources. It splits to 2 types of debug prints for resources: 1. No simulation - mask only 2. Simulation - mask + other info For 2, it shares printing on a single place in `ResourceManager` constructor, that should cover all the other simulation cases indirectly: 1. `llvm/lib/MCA/HardwareUnits/ResourceManager` - covered 2. `llvm/lib/MCA/InstrBuilder.c` - should be covered indirectly - only used by `llvm-mca` before simulation that constructs a `ResourceManager` 3. `llvm/tools/llvm-mca/Views/SummaryView.cpp` - after simulation that constructs a `ResourceManager` 4. `llvm/tools/llvm-mca/Views/BottleneckAnalysis.cpp` - after simulation that constructs a `ResourceManager` It also adds `BufferSize` to the output, which should be useful to debug scheduling model + MCA integration. For 1, it inlines mask-only printing into 2 other callers: 1. `llvm/include/llvm/MCA/Stages/InstructionTables.h` 2. `llvm/tools/llvm-exegesis/lib/SchedClassResolution.cpp` as they only use the masks there. I think this is a reasonable duplication across distinguishably different users/tools. Now every pair of callers, even across groups (1 and 2), effectively print in a mutually exclusive way. The patch adds debug tests for the 3 new callers, in the corresponding root test directories, to drive further location of logically target-independent tests that just require some target at the root. I think this convention is more discoverable, and is pretty widely used in the project.	2026-04-06 20:27:18 +03:00
Arthur Eubanks	72d4ce9889	[Inliner] Put inline history into IR as !inline_history metadata (#190092 ) So that it's preserved across all inline invocations rather than just one inliner pass run. This prevents cases where devirtualization in the simplification pipeline uncovers inlining opportunities that should be discarded due to inline history, but we dropped the inline history between inliner pass runs, causing code size to blow up, sometimes exponentially. For compile time reasons, we want to limit this to only call sites that have the potential to inline through SCCs, potentially with the help of devirtualization. This means that the callee is in a non-trivial (Ref)SCC, or the call site was previously an indirect call, which can potentially be devirtualized to call any function. The CGSCCUpdater::InlinedInternalEdges logic still seems to be relevant even with this change, as monster_scc.ll blows up if I remove that code. http://llvm-compile-time-tracker.com/compare.php?from=e830d88e8ae5f44a97cc76136a0a4e83aa9157c0&to=ed535e732fc41b79ab8efda2417886cbd0812f7f&stat=instructions:u Fixes #186926.	2026-04-06 10:24:41 -07:00
vangthao95	eb065bf028	AMDGPU/GlobalISel: RegBankLegalize rules for G_EXTRACT_VECTOR_ELT (#189144 )	2026-04-06 10:22:11 -07:00
Andrzej Warzyński	38c53b3eb9	[clang][cir][nfc] Fix comments, add missing EOF (#190623 )	2026-04-06 18:06:57 +01:00
Craig Topper	b44d2c977c	[RISCV] Use a vector MemVT when converting store+extractelt into a vector store. (#190107 ) This is needed so that `allowsMemoryAccessForAlignment` checks for unaligned vector memory support instead of unaligned scalar memory support when called from `RISCVTargetLowering::expandUnalignedVPStore` While there remove incorrect setting of the truncating store flag on the vector instruction. And restrict the transform to simple stores since we don't have tests for volatile or atomic. Fixes #189037	2026-04-06 09:58:04 -07:00
Craig Topper	0d14772a91	[RISCV][P-ext] Add isel patterns for for macc.h00/macc.w00. (#190444 ) The RV32 macc.h00 instructions take the lower half words from rs1 and rs2, compute the full word product by extending the inputs, and add to rd. The RV64 macc.w00 is similar but operates on words and produces a double word result. I've restricted this to case where the multiply has a single use. We don't have a general macc that multiplies the full xlen bits of rs1 and rs2, so I'm allowing the input to be sext_inreg/and or have sufficient sign/zero bits according to ComputeNumSignBits/computeKnownBits. We should also add mul.h00/mul.w00 patterns, but those we should restrict to at least one input being sext_inreg/and and prefer regular mul when there are no sext_inreg/and.	2026-04-06 09:57:29 -07:00
Wooseok Lee	0bef4c7aab	[AMDGPU] Add v2i32 and/or patterns for VOP3 AND_OR and OR3 operations (#188375 ) Add ThreeOp_v2i32_Pats pattern class to support v2i32 vector operations for AND_OR_B32 and OR3_B32 instructions. The new patterns check the v2i32 and-or or or-or instruction sequence, extract individual 32-bit elements from v2i32 operands, and applies the and_or or or3 vop3 operations.	2026-04-06 16:54:21 +00:00
Domenic Nutile	5b33f85a08	[AMDGPU] Change isSingleLaneExecution to account for WWM enabling lanes even if there's only one workitem (#188316 ) This issue was discovered during some downstream work around Vulkan CTS tests, specifically `dEQP-VK.subgroups.arithmetic.compute.subgroupadd_float` --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2026-04-06 12:51:46 -04:00
forking-google-bazel-bot[bot]	e7ac60c56b	[Bazel] Fixes ce1a9fd (#190577 ) This fixes ce1a9fd76640929fe340c5c5d1bb493ea09ca9bc. Co-authored-by: Google Bazel Bot <google-bazel-bot@google.com>	2026-04-06 09:40:22 -07:00
Valentin Clement (バレンタインクレメン)	baa1e5008b	[flang][cuda] Do not consider kernel result as host variable (#190626 )	2026-04-06 16:39:38 +00:00
adams381	9265f9284c	[mlir][ABI] Add writable, dead_on_unwind, dead_on_return, nofpclass param attrs to LLVM dialect (#188374 ) The MLIR LLVM dialect is missing support for several parameter attributes that exist in LLVM IR: `writable`, `dead_on_unwind`, `dead_on_return`, and `nofpclass`. This adds them to the kind-to-name mapping in `AttrKindDetail.h` and the corresponding name accessors in `LLVMDialect.td`. The existing generic conversion infrastructure in `ModuleTranslation` and `ModuleImport` picks them up automatically — `writable` and `dead_on_unwind` round-trip as `UnitAttr`, while `dead_on_return` and `nofpclass` round-trip as `IntegerAttr`. CIR needs these to match classic codegen's ABI output (sret gets `writable dead_on_unwind`, indirect args get `dead_on_return`, fast-math FP args get `nofpclass`).	2026-04-06 11:26:11 -05:00
Henrich Lauko	348295ac05	[CIR] Use data size in emitAggregateCopy for overlapping copies (#186702 ) Add skip_tail_padding property to cir.copy to handle potentially-overlapping subobject copies directly, instead of falling back to cir.libc.memcpy. When set, the lowering uses the record's data size (excluding tail padding) for the memcpy length. This keeps typed semantics and promotability of cir.copy. Also fix CXXABILowering to preserve op properties when recreating operations, and expose RecordType::computeStructDataSize() for computing data size of padded record types.	2026-04-06 18:24:10 +02:00
Eric Feng	930ef7736e	[mlir][amdgpu] Add optional write mask to amdgpu.global_load_async_to_lds (#190498 )	2026-04-06 09:21:32 -07:00
Ruoyu Qiu	06e666a8f6	[DA] Add overflow test for BanerjeeMIVtest (#190468 )	2026-04-06 16:02:56 +00:00
albertbolt1	8d7823ea8f	[CIR][AArch64] Added vector intrinsics for shift left (#187516 ) Added vector intrinsics for vshlq_n_s8 vshlq_n_s16 vshlq_n_s32 vshlq_n_s64 vshlq_n_u8 vshlq_n_u16 vshlq_n_u32 vshlq_n_u64 vshl_n_s8 vshl_n_s16 vshl_n_s32 vshl_n_s64 vshl_n_u8 vshl_n_u16 vshl_n_u32 vshl_n_u64 these cover all the vector intrinsics for constant shift the method followed 1) the vectors for quad words are of the form `64x2`, `32x4`, `16x8`, `8x16` and the shift is a constant value but for shift left we need both of them to be vectors so we take the constant shift and convert it into a vector of respective form, for `64x2` we convert the constant to `64x2`, I have learnt that this process is also called splat 2) After splat we have that the lhs and rhs are of the same size hence the shift left can be applied 3) There is one issue though, the ops[0] is not of the right size, for quad words it falls back to the default int8*16 in the function, so I am converting it to the required size using bit casting, `8x16` = `64x2` so we can bitcast and get the vector array in the right form. Wrote the test cases for all the intrinsics listed above #185382	2026-04-06 17:00:38 +01:00
Ryotaro Kasuga	34a16392fa	[DA] Use SmallVector instead of raw new/delete (NFC) (#190586 ) Some functions used `new`/`delete` to allocate/free arrays. To avoid memory leaks, it would be better to avoid using raw pointers. This patch replaces the use of them with `SmallVector`.	2026-04-06 15:54:34 +00:00
Krzysztof Parzyszek	4994a97135	[flang][OpenMP] Remove namespace qualification from GetUpperName, NFC (#190619 ) This applies to flang/lib/Semantics/openmp-utils.cpp, since it contains `using namespace Fortran::parser::omp`.	2026-04-06 10:49:25 -05:00
Matt Arsenault	bf2a97a0dd	AMDGPU: Add range attribute to mbcnt intrinsic callsites (#189191 ) It seems the known bits handling added in 686987a540bc176bceaad43ffe530cb3e88796d5 is insufficient to perform many range based optimizations. For some reason computeConstantRange doesn't fall back on KnownBits, and has a separate, less used form which tries to use computeKnownBits.	2026-04-06 14:40:54 +00:00
Erich Keane	297a70c9b5	[CIR] Implement global decomposition declarations (#190364 ) No real challenge to these, it is effectively a copy/paste of the classic codegen as it just requires we properly emit the holding variable. The rest falls out of the rest of our handling of variables.	2026-04-06 07:38:21 -07:00
Max Graey	c4281fd5af	[Support][ValueTraking] Improve KnownFPClass for fadd. Handle infinity signs (#190559 ) Improve KnownFPClass reasoning for fadd: - Refine NaN handling for infinities by checking opposite-sign cases: - `-inf` + `+inf` --> `nan` - `+inf` + `-inf` --> `nan` - `+inf` + `+inf` --> `+inf` - `-inf` + `-inf` --> `-inf` - Introduce `cannotBeOrderedLessEqZero` as pair to `cannotBeOrderedGreaterEqZero`.	2026-04-06 16:23:20 +02:00
Timm Baeder	59e899e16b	[clang][bytecode] Don't unref constexpr-unknown references (#190177 ) If the pointer for a reference is constexpr-unknown, use the pointer itself instead, instead of dereferencing it. Unfortunately, that means constexpr-unknown pointers to reach a lot more places than before.	2026-04-06 15:52:17 +02:00
Joe Nash	2ccc941549	[AMDGPU] Mark two instructions as DPMACC (#190391 ) It appears these were accidentally missed in #170319	2026-04-06 13:43:35 +00:00
Lei Huang	74ad441a80	Split DWARF v2 tests to exclude 64-bit AIX targets (#189077 ) 64-bit AIX requires DWARF64 format, which was only introduced in DWARF v3. DWARF v2 only supports 32-bit DWARF format, making it incompatible with 64-bit AIX (the compiler throws a fatal error). These changes split DWARF v2 tests into separate files that exclude 64-bit AIX targets while still running on 32-bit AIX and other 64-bit platforms where DWARF v2 is supported.	2026-04-06 09:21:31 -04:00
Trung Nguyen	b6e7c475cb	[CodeGen] Ignore `ANNOTATION_LABEL` in scheduler (#190499 ) This fixes a crash in `clang` for `armv7` targets when optimizations are enabled. Fixes #190497	2026-04-06 14:16:01 +02:00
Florian Hahn	0403639667	[VPlan] Skip successors outside any loop when updating LoopInfo. (#190553 ) Successors outside of any loop do not contribute to the innermost loop, skip them to avoid incorrect results due to getSmallestCommonLoop(nullptr, X) returning nullptr.	2026-04-06 12:58:41 +01:00
陈子昂	05ff170026	[InstCombine] Fix #163110 : Support peeling off matching shifts from icmp operands via canEvaluateShifted (#165975 ) Consider a pattern like `icmp (shl nsw X, L), (add nsw (shl nsw Y, L), K)`. When the constant K is a multiple of 2^L, this can be simplified to `icmp X, (add nsw Y, K >> L)`. This patch extends canEvaluateShifted to support `Instruction::Add` and updates its signature to accept `Instruction::BinaryOps` instead of a boolean. This change allows the function to distinguish between LShr and AShr requirements, ensuring that information is preserved according to the signedness and overflow flags (nsw/nuw) of the operands. The logic is integrated into `foldICmpCommutative` to enable peeling off matching shifts from both sides of a comparison even when an offset is present. Fixes: #163110	2026-04-06 13:44:17 +02:00
Nico Weber	3b0221090c	[gn] fix mistake from 88f6b181b6ab2 (#190601 )	2026-04-06 07:23:07 -04:00
Florian Hahn	64a0bd1227	[LV] Return best VPlan together with VF from computeBestVF (NFC). (#190385 ) computeBestVF iterates over all VPlans and picks the VF of the most profitable VPlan. This VPlan is later needed for execution and additional checks. Instead of retrieving it multiple times later, just directly return it from computeBestVF. This removes some redundant lookups. PR: https://github.com/llvm/llvm-project/pull/190385	2026-04-06 11:01:18 +01:00
Nishant Sachdeva	4cce6f85fb	[llvm-ir2vec] Added Enum for ir2vec embedding mode (#190466 ) Currently, the initEmbedding() takes mode as an input. This input is a string input. This PR introduces a patch to take the input as an enum value.	2026-04-06 14:05:00 +05:30
Florian Hahn	f7cdebb478	[VPlan] Mark unary ops as not having side-effects (NFC). (#190554 ) Mark unary ops (only FNeg current) to neither read nor write memory, similar to binary and cast ops. Should currently be NFC end-to-end.	2026-04-06 09:05:38 +01:00
Srinivasa Ravi	63231ebfe7	[MLIR][NVVM] Add new narrow FP convert Ops (#184291 ) This change adds the following NVVM Ops for new narrow FP conversions introduced in PTX 9.1: - `convert.{f32x2/bf16x2}.to.s2f6x2` - `convert.s2f6x2.to.bf16x2` - `convert.bf16x2.to.f8x2` (extended for `f8E4M3FN` and `f8E5M2` types) - `convert.{f16x2/bf16x2}.to.f6x2` - `convert.{f16x2/bf16x2}.to.f4x2` PTX ISA Reference: https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cvt	2026-04-06 12:06:25 +05:30
Baranov Victor	e326ff2a88	[clang-tidy] Fix FP on cppcoreguidelines-pro-type-member-init with forward decl (#190521 ) Fixes https://github.com/llvm/llvm-project/issues/155416.	2026-04-06 08:42:24 +03:00
lonely eagle	ce1a9fd766	Reland "[mlir][reducer] Add eraseRedundantBlocksInRegion and getSuccessorForwardOperands API to BranchOpInterface" (#189253 ) After fixing undefined symbol and memory leak issues(You can see previous issue https://github.com/llvm/llvm-project/pull/189150), the PR would like to reland it(https://github.com/llvm/llvm-project/pull/187864).	2026-04-06 12:54:15 +08:00
Yashwant Singh	5e14916fa6	Early exit llvm-bolt when coming across empty data files (#176859 ) perf2bolt generates empty fdata files for small binaries and right now BOLT does this check while parsing by calling `((!hasBranchData() && !hasMemData()))`. Instead, early exit as soon as the buffer finishes reading the data file and exit with error message.	2026-04-06 09:37:05 +05:30
Michael Kruse	26697f4d07	[Polly] Correct integer comparison bit width (#190493 ) For making an integer compareable to bool, don't compare it to bool. Bug occured during the reduction of #190459	2026-04-06 01:09:51 +00:00
Wenju He	1839b755dd	[runtimes] Skip custom linker validation for gpu/offload targets (#189933 ) This fixes `Host compiler does not support '-fuse-ld=lld'` error when cross-build libclc for gpu target. Cmake configure command is: -DRUNTIMES_amdgcn-amd-amdhsa-llvm_LLVM_ENABLE_RUNTIMES=libclc \ -DLLVM_RUNTIME_TARGETS="amdgcn-amd-amdhsa-llvm" libclc targets only support offload target cross-build and can't link host executable. The configuration error is false positive for offload. This PR adds a baseline test to first check if the target can link executable. If it fails (typical for gpu/offload), we skip the custom linker validation.	2026-04-06 07:18:36 +08:00
Florian Hahn	58208a0cc1	[LV] Additional epilogue tests for find-iv and with uses of IV.(NFC) (#190548 ) Additional test coverage for loops not yet supported, with sinkable find-iv expressions (github.com/llvm/llvm-project/pull/183911) and uses of the IV. PR: https://github.com/llvm/llvm-project/pull/190548	2026-04-05 20:42:11 +00:00
Florian Hahn	c109dd1e9a	[VPlan] Refactor FindLastSelect matching to use m_Specific(PhiR) (NFC). (#190547 ) Match the select operands directly against PhiR using m_Specific, binding only the non-phi IV expression. This replaces the generic TrueVal/FalseVal matching followed by an assert and conditional extraction. Split off from approved https://github.com/llvm/llvm-project/pull/183911/ as suggested.	2026-04-05 20:07:34 +00:00
Hicham Omari	4bd1facaed	[llvm][docs] Fix typo (#190150 ) This commit corrects a typo in the project documentation.	2026-04-05 21:00:25 +01:00
Samuel Thibault	9ce30c8dc3	[Orc][LibResolver] Fix GNU/Hurd build (#184470 ) GNU/Hurd does not put a PATH_MAX static constraint on path lengths. We can instead check the symlink length.	2026-04-05 19:56:31 +01:00
Sergei Barannikov	11e7a49a58	[lldb] Remove VMRange class (NFC) (#190475 ) We have a template class `Range` that provides similar functionality and is much more widely used.	2026-04-05 18:39:44 +00:00
Sergei Barannikov	f8e394b6f8	[lldb] Fix section offset of synthesized entry point symbol (#190348 ) In the non-ARM case, the offset was left unset, so the symbol synthesized for the entry point pointed to the start of the containing section. As a drive-by change, simplify offset adjustment in ARM case.	2026-04-05 21:39:24 +03:00
Jonas Devlieghere	353ab41001	[lldb] Update error message in SocketTest::CreatePair (#190544 )	2026-04-05 18:23:36 +00:00
Jonas Devlieghere	f866ef202c	[lldb] Bring more diagnostics in compliance with our coding standards (#190410 ) The LLVM Coding Standards [1] specify that: > [T]o match error message styles commonly produced by other tools, > start the first sentence with a lowercase letter, and finish the last > sentence without a period, if it would end in one otherwise. Historically, that hasn't been something we've enforced in LLDB, but in the past year or so I've started to pay more attention to this in code reviews. This PR brings more error messages in compliance, further increasing consistency. I also adopted `createStringErrorV` where it improved the code as a drive-by for lines I was already touching. [1] https://llvm.org/docs/CodingStandards.html#error-and-warning-messages Assisted-by: Claude Code	2026-04-05 10:41:47 -07:00
Florian Hahn	36e495dd90	[VPlan] Use APSInt in CheckSentinel directly (NFC). (#190534 ) Simplify the sentinel checking logic by using APSInt and checking for both a signed and unsigned sentinel in a single call. Removes the IsSigned argument Split off from approved https://github.com/llvm/llvm-project/pull/183911/ as suggested.	2026-04-05 16:43:59 +00:00
Florian Hahn	a2c16bb59f	[VPlan] Rename CondSelect to FindLastSelect (NFC). (#190536 ) …ns (NFC). Use the more descriptive name FindLastSelect for the conditional select that picks between the reduction phi and the IV value. Split off from approved https://github.com/llvm/llvm-project/pull/183911/ as suggested.	2026-04-05 16:39:34 +00:00

1 2 3 4 5 ...

575643 Commits