llvm-project

Author	SHA1	Message	Date
Benjamin Maxwell	135ddf1e8e	[AArch64][SVE] Add basic support for `@llvm.masked.compressstore` (#168350 ) This patch adds SVE support for the `masked.compressstore` intrinsic via the existing `VECTOR_COMPRESS` lowering and compressing the store mask via `VECREDUCE_ADD`. Currently, only `nxv4[i32\|f32]` and `nxv2[i64\|f64]` are directly supported, with other types promoted to these, where possible. This is done in preparation for LV support of this intrinsic, which is currently being worked on in #140723.	2025-11-28 10:17:36 +00:00
Alex Bradbury	fd19a20a1a	Revert "[ShrinkWrap] Modify shrink wrapping to accommodate functions terminated by no-return blocks" (#169852 ) Reverts llvm/llvm-project#167548 As commented at https://github.com/llvm/llvm-project/pull/167548#issuecomment-3587008602 this is causing miscompiles in two-stage RISC-V Clang/LLVM builds that result in test failures on the builders.	2025-11-27 19:12:56 +00:00
Nathan Corbyn	650eeb867f	[ShrinkWrap] Modify shrink wrapping to accommodate functions terminated by no-return blocks (#167548 ) At present, the shrink wrapping pass misses opportunities to shrink wrap in the presence of machine basic blocks which exit the function without returning. Such cases arise from C++ functions like the following: ```cxx int foo(int err, void* ptr) { if (err == -1) { if (ptr == nullptr) { throw MyException("Received `nullptr`!", __FILE__, __LINE__); } handle(ptr); } return STATUS_OK; } ``` In particular, assuming `MyException`'s constructor is not marked `noexcept`, the above code will generate a trivial EH landing pad calling `__cxa_free_exception()` and rethrowing the unhandled internal exception, exiting the function without returning. As such, the shrink wrapping pass refuses to touch the above function, spilling to the stack on every call, even though no CSRs are clobbered on the hot path. This patch tweaks the shrink wrapping logic to enable the pass to fire in this and similar cases.	2025-11-27 10:17:25 +00:00
Peter Collingbourne	6227eb90da	Add IR and codegen support for deactivation symbols. Deactivation symbols are a mechanism for allowing object files to disable specific instructions in other object files at link time. The initial use case is for pointer field protection. For more information, see the RFC: https://discourse.llvm.org/t/rfc-deactivation-symbols/85556 Reviewers: ojhunt, nikic, fmayer, arsenm, ahmedbougacha Reviewed By: fmayer Pull Request: https://github.com/llvm/llvm-project/pull/133536	2025-11-26 12:37:09 -08:00
Matt Arsenault	9b88cd9945	CodeGen: Remove PointerLikeRegClass handling from codegen (#159883 ) All uses have been migrated to RegClassByHwMode. This is now an implementation detail of InstrInfoEmitter for pseudoinstructions.	2025-11-26 10:14:37 -05:00
daniilavdeev	cc1c41724d	[dwarf] make dwarf fission compatible with RISCV relaxations 2/2 (#164813 ) This patch makes DWARF fission compatible with RISC-V relaxations by using indirect addressing for the DW_AT_high_pc attribute. This eliminates the remaining relocations in .dwo files.	2025-11-26 15:36:02 +03:00
daniilavdeev	5f777b2c8f	[dwarf] make dwarf fission compatible with RISCV relaxations 1/2 (#166597 ) Currently, -gsplit-dwarf and -mrelax are incompatible options in Clang. The issue is that .dwo files should not contain any relocations, as they are not processed by the linker. However, relaxable code emits relocations in DWARF for debug ranges that reside in the .dwo file when DWARF fission is enabled. This patch makes DWARF fission compatible with RISC-V relaxations. It uses the StartxEndx DWARF forms in .debug_rnglists.dwo, which allow referencing addresses from .debug_addr instead of using absolute addresses. This approach eliminates relocations from .dwo files.	2025-11-26 00:52:22 +03:00
Matt Arsenault	1c5b1501ca	CodeGen: Move libcall lowering configuration to subtarget (#168621 ) Previously libcall lowering decisions were made directly in the TargetLowering constructor. Pull these into the subtarget to facilitate turning LibcallLoweringInfo into a separate analysis in the future.	2025-11-25 11:59:56 -05:00
Drew Kersnar	17852deda7	[NVPTX] Lower LLVM masked vector loads and stores to PTX (#159387 ) This backend support will allow the LoadStoreVectorizer, in certain cases, to fill in gaps when creating load/store vectors and generate LLVM masked load/stores (https://llvm.org/docs/LangRef.html#llvm-masked-store-intrinsics). To accomplish this, changes are separated into two parts. This first part has the backend lowering and TTI changes, and a follow up PR will have the LSV generate these intrinsics: https://github.com/llvm/llvm-project/pull/159388. In this backend change, Masked Loads get lowered to PTX with `#pragma "used_bytes_mask" [mask];` (https://docs.nvidia.com/cuda/parallel-thread-execution/#pragma-strings-used-bytes-mask). And Masked Stores get lowered to PTX using the new sink symbol syntax (https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-st). # TTI Changes TTI changes are needed because NVPTX only supports masked loads/stores with _constant_ masks. `ScalarizeMaskedMemIntrin.cpp` is adjusted to check that the mask is constant and pass that result into the TTI check. Behavior shouldn't change for non-NVPTX targets, which do not care whether the mask is variable or constant when determining legality, but all TTI files that implement these API need to be updated. # Masked store lowering implementation details If the masked stores make it to the NVPTX backend without being scalarized, they are handled by the following: * `NVPTXISelLowering.cpp` - Sets up a custom operation action and handles it in lowerMSTORE. Similar handling to normal store vectors, except we read the mask and place a sentinel register `$noreg` in each position where the mask reads as false. For example, ``` t10: v8i1 = BUILD_VECTOR Constant:i1<-1>, Constant:i1<0>, Constant:i1<0>, Constant:i1<-1>, Constant:i1<-1>, Constant:i1<0>, Constant:i1<0>, Constant:i1<-1> t11: ch = masked_store<(store unknown-size into %ir.lsr.iv28, align 32, addrspace 1)> t5:1, t5, t7, undef:i64, t10 -> STV_i32_v8 killed %13:int32regs, $noreg, $noreg, killed %16:int32regs, killed %17:int32regs, $noreg, $noreg, killed %20:int32regs, 0, 0, 1, 8, 0, 32, %4:int64regs, 0, debug-location !18 :: (store unknown-size into %ir.lsr.iv28, align 32, addrspace 1); ``` * `NVPTXInstInfo.td` - changes the definition of store vectors to allow for a mix of sink symbols and registers. * `NVPXInstPrinter.h/.cpp` - Handles the `$noreg` case by printing "_". # Masked load lowering implementation details Masked loads are routed to normal PTX loads, with one difference: a `#pragma "used_bytes_mask"` is emitted before the load instruction (https://docs.nvidia.com/cuda/parallel-thread-execution/#pragma-strings-used-bytes-mask). To accomplish this, a new operand is added to every NVPTXISD Load type representing this mask. * `NVPTXISelLowering.h/.cpp` - Masked loads are converted into normal NVPTXISD loads with a mask operand in two ways. 1) In type legalization through replaceLoadVector, which is the normal path, and 2) through LowerMLOAD, to handle the legal vector types (v2f16/v2bf16/v2i16/v4i8/v2f32) that will not be type legalized. Both share the same convertMLOADToLoadWithUsedBytesMask helper. Both default this operand to UINT32_MAX, representing all bytes on. For the latter, we need a new `NVPTXISD::MLoadV1` type to represent that edge case because we cannot put the used bytes mask operand on a generic LoadSDNode. * `NVPTXISelDAGToDAG.cpp` - Extract used bytes mask from loads, add them to created machine instructions. * `NVPTXInstPrinter.h/.cpp` - Print the pragma when the used bytes mask isn't all ones. * `NVPTXForwardParams.cpp`, `NVPTXReplaceImageHandles.cpp` - Update manual indexing of load operands to account for new operand. * `NVPTXInsrtInfo.td`, `NVPTXIntrinsics.td` - Add the used bytes mask to the MI definitions. * `NVPTXTagInvariantLoads.cpp` - Ensure that masked loads also get tagged as invariant. Some generic changes that are needed: * `LegalizeVectorTypes.cpp` - Ensure flags are preserved when splitting masked loads. * `SelectionDAGBuilder.cpp` - Preserve `MD_invariant_load` on masked load SDNode creation	2025-11-25 10:26:15 -06:00
Sander de Smalen	e1b08731e5	Revert "Reland "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG"" This reverts commit bb78728826ff57f3df859e79bfd857b5a175bb6d.	2025-11-25 11:01:27 +00:00
Sander de Smalen	bb78728826	Reland "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG" A SUBREG_TO_REG instruction expresses that the top bits of the result register are set to a certain value (e.g. 0). The example below expresses that the result of %1 will have the top 32 bits zeroed and the lower 32bits being equal to the result of INSTR. ``` %0:gpr32 = INSTR %1:gpr64 = SUBREG_TO_REG 0, %0, sub32 ``` When the RegisterCoalescer tries to remove SUBREG_TO_REG instructions by coalescing %0 into %1, it must keep the same semantics. Currently however, the RegisterCoalescer would emit: ``` %1.sub32:gpr64 = INSTR ``` which no longer expresses that the top 32-bits of the register are defined (zeroed) by INSTR. This may cause issues with e.g. machine copy propagation where the pass may think it can remove a COPY-like instruction because the MIR says only the bottom 32-bits are defined/used, even though other uses of the register rely on the top 32-bits being zeroed by the COPY-like instruction. This PR changes the RegisterCoalescer to instead emit: ``` undef %1.sub32:gpr64 = MOVimm32 42, implicit-def %1 ``` to express that the entire contents of %1:gpr64 are defined by the instruction. This tries to reland #134408 which had to be reverted due to a few reported failures.	2025-11-24 15:55:19 +00:00
hstk30-hw	a6cec3f3e5	Reland "[RegAlloc] Fix the terminal rule check for interfere with DstReg (#168661 )" (#169219 ) Reland d5f3ab8ec97786476a077b0c8e35c7c337dfddf2, fix testcases.	2025-11-24 09:27:25 +08:00
Aiden Grossman	d5f3ab8ec9	Revert "[RegAlloc] Fix the terminal rule check for interfere with DstReg (#168661 )" This reverts commit 0859ac5866a0228f5607dd329f83f4a9622dedcc. This caused a couple test failures, likely due to a mid-air collision. Reverting for now to get the tree back to green and allow the original author to run UTC/friends and verify the output.	2025-11-23 05:17:45 +00:00
hstk30-hw	0859ac5866	[RegAlloc] Fix the terminal rule check for interfere with DstReg (#168661 ) This maybe a bug which is introduced by commit 6749ae36b4a33769e7a77cf812d7cd0a908ae3b9, and has been present ever since. In this case, `OtherReg` always overlaps with `DstReg` cause they from the `Copy` all.	2025-11-23 10:11:24 +08:00
Kazu Hirata	b296386d2c	[llvm] Use llvm::equal (NFC) (#169173 ) While I am at it, this patch uses const l-value references for std::shared_ptr. We don't need to increment the reference count by passing std::shared_ptr by value. Identified with llvm-use-ranges.	2025-11-22 15:30:43 -08:00
Aiden Grossman	ad7a5d4e05	[CallBrPrepare] Prefer Function &F over Function &Fn Function &F is the more standard abbreviation (~4000 uses in llvm versus ~300 uses).	2025-11-22 07:37:20 +00:00
Hongyu Chen	3fec26e329	[DAGCombiner] Don't optimize insert_vector_elt into shuffle if implicit truncation exists (#169022 ) Fixes #169017	2025-11-22 03:33:53 +08:00
Matt Arsenault	1d73b68463	TargetLowering: Avoid hardcoding OpenBSD + __guard_local name (#167744 ) Query RuntimeLibcalls for the support and the name. The check that the implementation is exactly __guard_local instead of unsupported feels a bit strange.	2025-11-20 20:44:25 -05:00
Craig Topper	01e5e4fd00	[DAGCombiner] Remove unneeded m_BitReverse from visitBITREVERSE. NFC (#168918 ) We already know we're looking at BITREVERSE, we can match on the source operand.	2025-11-20 18:20:47 +00:00
Matt Arsenault	0e1cb2de90	Reapply "DAG: Allow select ptr combine for non-0 address spaces" (#168292 ) (#168786 ) This reverts commit 6d5f87fc4284c4c22512778afaf7f2ba9326ba7b. Previously this failed due to treating the unknown MachineMemOperand value as known uniform.	2025-11-20 12:13:46 -05:00
Ramkumar Ramachandra	602fa0c7ce	[SDAG] Fix whitespace errors (NFC) (#168897 ) To make life easier for future contributors. Note that formatting changes are due to git clang-format on the touched whitespace-error lines.	2025-11-20 16:44:31 +00:00
Jeremy Morse	2cf550a040	[DebugInfo] Force early line-zero calls to have meaningful locations (#156850 ) In functions that have been seriously deformed during optimisation, there can be call instructions with line-zero immediately after frame setup (see C reproducer in the test added). Our previous algorithms for prologue_end ignored these, meaning someone entering a function at prologue_end would break-in after a function call had completed. Prefer instead to place prologue_end and the function scope-line on the line zero call: this isn't false (it's the first meaningful instruction of the function) and is approximately true. Given a less than ideal function, this is an OK solution.	2025-11-20 10:20:47 +00:00
Craig Topper	8608344778	[CFIInserter] Turn a reachable llvm_unreachable into a report_fatal_error. (#168777 ) This prevents it from being optimized out in non-asserts builds. Update X86 test to remove REQUIRES: asserts and check for LLVM ERROR. Add FileCheck to RISC-V test and remove UNSUPPORTED. This is the more complete fix for #168772 and #168525.	2025-11-19 22:32:26 -08:00
Matt Arsenault	db20a7f2bc	DAG: Fix constructing a temporary TargetTransformInfo instance (#168480 )	2025-11-20 01:19:23 -05:00
Carl Ritson	b1c4b55118	RenameIndependentSubregs: try to only implicit def used subregs (#167486 ) Attempt to only define used subregisters when creating IMPLICIT_DEF fix ups for live interval subranges. This avoids the appearance at the MIR level of entire (wide) registers becoming live rather than relying only on transient LiveIntervals dead definitions for unused subregisters.	2025-11-20 09:28:34 +09:00
Matt Arsenault	253ed52436	DAG: Use poison for some vector result widening (#168290 )	2025-11-19 16:49:43 -05:00
Matt Arsenault	a757c4e74e	CodeGen: Add subtarget to TargetLoweringBase constructor (#168620 ) Currently LibcallLoweringInfo is defined inside of TargetLowering, which is owned by the subtarget. Pass in the subtarget so we can construct LibcallLoweringInfo with the subtarget. This is a temporary step that should be revertable in the future, after LibcallLoweringInfo is moved out of TargetLowering.	2025-11-19 19:18:13 +00:00
Matt Arsenault	0b921f52cc	DAG: Use poison when splitting vector_shuffle results (#168176 )	2025-11-19 12:27:08 -05:00
Ryan Cowan	58e6d02aa2	[AArch64][GlobalISel] Check unmergeSrc is a vector in matchCombineBuildUnmerge (#168692 ) This aims to fix the crash in #168495, my combine rule was missing a check that the source vector was in fact a vector. This then caused the legality check to fail in this example as the concat was trying to concat a non vector. I have also gated the bitcast of the concat to only work on non-scalable vectors as the mutation calls `getNumElements` which crashes when called on a scalable vector. Fixes #168495	2025-11-19 12:30:51 +00:00
陈子昂	e38529ddbb	[DAG] Update canCreateUndefOrPoison to handle ISD::VECTOR_COMPRESS (#168010 ) Fixes #167710	2025-11-19 10:21:05 +00:00
Tom Tromey	1262acf4ec	Introduce DwarfUnit::addBlock helper method (#168446 ) This patch is just a small cleanup that unifies the various spots that add a DWARF expression to the output.	2025-11-18 22:59:36 +00:00
Craig Topper	1157a22134	[GISel] Use getScalarSizeInBits in LegalizerHelper::lowerBitCount (#168584 ) For vectors, CTLZ, CTTZ, CTPOP all operate on individual elements. The lowering should be based on the element width. I noticed this by inspection. No tests in tree are currently affected, but I thought it would be good to fix so someone doesn't have to debug it in the future.	2025-11-18 12:26:47 -08:00
Craig Topper	96e58b83a3	[RISCV] Legalize misaligned unmasked vp.load/vp.store to vle8/vse8. (#167745 ) If vector-unaligned-mem support is not enabled, we should not generate loads/stores that are not aligned to their element size. We already do this for non-VP vector loads/stores. This code has been in our downstream for about a year and a half after finding the vectorizer generating misaligned loads/stores. I don't think that is unique to our downstream. Doing this for masked vp.load/store requires widening the mask as well which is harder to do. NOTE: Because we have to scale the VL, this will introduce additional vsetvli and the VL optimizer will not be effective at optimizing any arithmetic that is consumed by the store.	2025-11-18 11:13:54 -08:00
Hongyu Chen	523bd2df6d	[GISel][RISCV] Compute CTPOP of small odd-sized integer correctly (#168559 ) Fixes the assertion in #168523 This patch lifts the small, odd-sized integer to 8 bits, ensuring that the following lowering code behaves correctly.	2025-11-18 18:49:13 +00:00
Nathan Corbyn	93a8ca8fc7	[AArch64][GISel] Don't crash in known-bits when copying from vectors to non-vectors (#168081 ) Updates the demanded elements before recursing through copies in case the type of the source register changes from a non-vector register to a vector register. Fixes #167842.	2025-11-18 16:42:58 +00:00
Hassnaa Hamdi	3d5d32c605	[CGP]: Optimize mul.overflow. (#148343 ) - Detect cases where LHS & RHS values will not cause overflow (when the Hi halfs are zero).	2025-11-18 13:15:47 +00:00
David Green	4ecfaa602f	[AArch64][GlobalISel] Add better basic legalization for llround. (#168427 ) This adds handling for f16 and f128 lround/llround under LP64 targets, promoting the f16 where needed and using a libcall for f128. This codegen is now identical to the selection dag version.	2025-11-18 12:05:02 +00:00
Sander de Smalen	f369a53d82	[DAGCombiner] Fold select into partial.reduce.add operands. (#167857 ) This generates more optimal codegen when using partial reductions with predication. ``` partial_reduce_mla(acc, sel(p, mul(ext(a), ext(b)), splat(0)), splat(1)) -> partial_reduce_mla(acc, sel(p, a, splat(0)), b) partial.reduce.mla(acc, sel(p, ext(op), splat(0)), splat(1)) -> partial.reduce.*mla(acc, sel(p, op, splat(0)), splat(trunc(1))) ```	2025-11-18 09:49:42 +00:00
Aiden Grossman	472e4ab0b0	[MLGO] Fully Remove MLRegalloc Experimental Features (#168252 ) 20a22a45e96bc94c3a8295cccc9031bd87552725 was supposed to fully remove these, but left around the functionality to actually compute them and a unittest that ensured they worked. These are not development features in the sense of features used in development mode, but experimental features that have been superseded by MIR2Vec.	2025-11-17 10:07:48 -08:00
Ryan Cowan	d65be16ab6	[AArch64][GlobalISel] Add combine for build_vector(unmerge, unmerge, undef, undef) (#165539 ) This PR adds a new combine to the `post-legalizer-combiner` pass. The new combine checks for vectors being unmerged and subsequently padded with `G_IMPLICIT_DEF` values by building a new vector. If such a case is found, the vector being unmerged is instead just concatenated with a `G_IMPLICIT_DEF` that is as wide as the vector being unmerged. This removes unnecessary `mov` instructions in a few places.	2025-11-17 15:55:40 +00:00
David Green	22968f5b4a	[DAG] Add strictfp implicit def reg after metadata. (#168282 ) This prevents a machine verifier error, where it "Expected implicit register after groups". Fixes #158661	2025-11-17 10:57:21 +00:00
Abinaya Saravanan	c946418330	[MachinePipeliner] Detect a cycle in PHI dependencies early on (#167095 ) - This patch detects cycles by phis and bails out if one is found. - It prevents to violate DAG restrictions. Abort pipelining in the below case %1 = phi i32 [ %a, %entry ], [ %3, %loop ] %2 = phi i32 [ %a, %entry ], [ %1, %loop ] %3 = phi i32 [ %b, %entry ], [ %2, %loop ] --------- Co-authored-by: Ryotaro Kasuga <kasuga.ryotaro@fujitsu.com>	2025-11-17 15:28:30 +05:30
pvanhout	853ed3b3b7	[InlineAsmLowering] unsigned -> TypeSize for getTypeStoreSize result	2025-11-17 10:21:43 +01:00
hstk30-hw	51c8180515	[GlobalMerge]Prefer use global-merge-max-offset instead of the target-specific constant offset. (#165591 ) In the Dhrystone benchmark, I find some adjacent global not be merged, on the contrary the GCC's anchor optimize is work. Use global-merge-max-offset to set the max offset can yield similar results (still slightly different, at least we can control the offset).	2025-11-17 15:37:51 +08:00
ronlieb	6d5f87fc42	Revert "DAG: Allow select ptr combine for non-0 address spaces" (#168292 ) Reverts llvm/llvm-project#167909	2025-11-16 18:35:51 -05:00
Kazu Hirata	98d49d51c0	[CodeGen] Remove a redundant declaration (NFC) (#168285 ) EnableFSDiscriminator is declared in DebugInfoMetadata.h. Identified with readability-redundant-declaration.	2025-11-16 14:06:18 -08:00
Matt Arsenault	dd9bd3e8f0	DAG: Preserve poison in combineConcatVectorOfScalars (#168220 )	2025-11-16 11:16:34 -08:00
Sergei Barannikov	97a60aa37a	[CodeGen] Turn MCRegUnit into an enum class (NFC) (#167943 ) This changes `MCRegUnit` type from `unsigned` to `enum class : unsigned` and inserts necessary casts. The added `MCRegUnitToIndex` functor is used with `SparseSet`, `SparseMultiSet` and `IndexedMap` in a few places. `MCRegUnit` is opaque to users, so it didn't seem worth making it a full-fledged class like `Register`. Static type checking has detected one issue in `PrologueEpilogueInserter.cpp`, where `BitVector` created for `MCRegister` is indexed by both `MCRegister` and `MCRegUnit`. The number of casts could be reduced by using `IndexedMap` in more places and/or adding a `BitVector` adaptor, but the number of casts per file is still small and `IndexedMap` has limitations, so it didn't seem worth the effort. Pull Request: https://github.com/llvm/llvm-project/pull/167943	2025-11-16 20:46:44 +03:00
Sergei Barannikov	e413343ca7	[SelectionDAG] Verify SDTCisVT and SDTCVecEltisVT constraints (#150125 ) Teach `SDNodeInfoEmitter` TableGen backend to process `SDTypeConstraint` records and emit tables for them. The tables are used by `SDNodeInfo::verifyNode()` to validate a node being created. This PR only adds validation code for `SDTCisVT` and `SDTCVecEltisVT` constraints to keep it smaller. Pull Request: https://github.com/llvm/llvm-project/pull/150125	2025-11-16 18:26:03 +03:00
AZero13	d831f8df52	[SelectionDAG] Fix AArch64 machine verifier bug when expanding LOOP_DEPENDENCE_MASK (#168221 ) TargetConstant nodes don't match TableGen ImmLeaf patterns during instruction selection. When this zero constant flows into the AArch64 CCMP formation code, the machine verifier hits an assertion in expensive checks. Fixes: #168227	2025-11-15 21:12:11 +00:00

1 2 3 4 5 ...

38740 Commits