llvm-project

Author	SHA1	Message	Date
Tobias Stadler	1302610f03	[MergeFunc] Fix crash caused by bitcasting ArrayType (#133259 ) createCast in MergeFunctions did not consider ArrayTypes, which results in the creation of a bitcast between ArrayTypes in the thunk function, leading to an assertion failure in the provided test case. The version of createCast in GlobalMergeFunctions does handle ArrayTypes, so this common code has been factored out into the IRBuilder.	2025-04-04 10:16:40 +01:00
zhijian lin	1a540c3b8b	[PowerPC] Deprecate uses of ISD::ADDC/ISD::ADDE/ISD::SUBC/ISD::SUBE (#133155 ) ISD::ADDC, ISD::ADDE, ISD::SUBC and ISD::SUBE are being deprecated, using ISD::UADDO_CARRY,ISD::USUBO_CARRY instead. Lowering the UADDO, UADDO_CARRY, USUBO, USUBO_CARRY in the patch.	2025-04-03 13:22:49 -04:00
Nikita Popov	efbbdd69c7	[ADT] Make DenseMap::init() private (NFC) (#134229 ) I believe this method was not supposed to be public, as it has additional preconditions (it will misbehave when called on a non-empty DenseMap). The public API for this is reserve().	2025-04-03 15:14:45 +02:00
David Green	6c27817294	[SelectionDAG] Use SimplifyDemandedBits from SimplifyDemandedVectorElts Bitcast. (#133717 ) This adds a call to SimplifyDemandedBits from bitcasts with scalar input types in SimplifyDemandedVectorElts, which can help simplify the input scalar.	2025-04-03 11:14:08 +01:00
Hua Tian	7e65944292	[llvm][CodeGen] avoid repeated interval calculation in window scheduler (#132352 ) Some new registers are reused when replacing some old ones in certain use case of ModuloScheduleExpander. It is necessary to avoid repeated interval calculations for these registers.	2025-04-03 14:25:55 +08:00
LU-JOHN	6a46c6c865	Ensure KnownBits passed when calculating from range md has right size (#132985 ) KnownBits passed to computeKnownBitsFromRangeMetadata must have the same bit width as the range metadata bit width. Otherwise the calculated results will be incorrect. --------- Signed-off-by: John Lu <John.Lu@amd.com>	2025-04-03 10:17:14 +07:00
Sami Tolvanen	acc6bcdc50	Support alternative sections for patchable function entries (#131230 ) With -fpatchable-function-entry (or the patchable_function_entry function attribute), we emit records of patchable entry locations to the __patchable_function_entries section. Add an additional parameter to the command line option that allows one to specify a different default section name for the records, and an identical parameter to the function attribute that allows one to override the section used. The main use case for this change is the Linux kernel using prefix NOPs for ftrace, and thus depending on__patchable_function_entries to locate traceable functions. Functions that are not traceable currently disable entry NOPs using the function attribute, but this creates a compatibility issue with -fsanitize=kcfi, which expects all indirectly callable functions to have a type hash prefix at the same offset from the function entry. Adding a section parameter would allow the kernel to distinguish between traceable and non-traceable functions by adding entry records to separate sections while maintaining a stable function prefix layout for all functions. LKML discussion: https://lore.kernel.org/lkml/Y1QEzk%2FA41PKLEPe@hirez.programming.kicks-ass.net/	2025-04-02 21:53:55 +00:00
Ryan Buchner	fa2a6d68c6	[CodeGenPrepare][RISCV] Combine (X ^ Y) and (X == Y) where appropriate (#130922 ) Fixes #130510. In RISCV, modify the folding of (X ^ Y == 0) -> (X == Y) to account for cases where the (X ^ Y) will be re-used. If a constant is being used for the XOR before a branch, ensure that it is small enough to fit within a 12-bit immediate field. Otherwise, the equality check is more efficient than the check against 0, see the following: ``` # %bb.0: lui a1, 5 addiw a1, a1, 1365 xor a0, a0, a1 beqz a0, .LBB0_2 # %bb.1: ret .LBB0_2: ``` ``` # %bb.0: lui a1, 5 addiw a1, a1, 1365 beq a0, a1, .LBB0_2 # %bb.1: xor a0, a0, a1 ret .LBB0_2: ``` Similarly, if the XOR is between 1 and a size one integer, we should still fold away the XOR since that comparison can be optimized as a comparison against 0. ``` # %bb.0: slt a0, a0, a1 xor a0, a0, 1 beqz a0, .LBB0_2 # %bb.1: ret .LBB0_2: ``` ``` # %bb.0: slt a0, a0, a1 bnez a0, .LBB0_2 # %bb.1: xor a0, a0, 1 ret .LBB0_2: ``` One question about my code is that I used a hard-coded value for the width of a RISCV ALU immediate. Do you know of a way that I can gather this from the `context`, I was unable to devise one.	2025-04-02 09:56:09 -07:00
Nikita Popov	9356091a98	[GlobalMerge][PPC] Don't merge globals in llvm.metadata section (#131801 ) The llvm.metadata section is not emitted and has special semantics. We should not merge globals in it, similarly to how we already skip merging of `llvm.xyz` globals. Fixes https://github.com/llvm/llvm-project/issues/131394.	2025-04-02 10:40:53 +02:00
Petr Hosek	4b19db6db9	Revert "AsmPrinter: Remove ELF's special lowerRelativeReference for unnamed_addr function" (#133935 ) Reverts llvm/llvm-project#132684	2025-04-01 09:39:07 -07:00
Jeremy Morse	1ebc308bba	[DebugInfo][RemoveDIs] Remove debug-intrinsic printing cmdline options (#131855 ) During the transition from debug intrinsics to debug records, we used several different command line options to customise handling: the printing of debug records to bitcode and textual could be independent of how the debug-info was represented inside a module, whether the autoupgrader ran could be customised. This was all valuable during development, but now that totally removing debug intrinsics is coming up, this patch removes those options in favour of a single flag (experimental-debuginfo-iterators), which enables autoupgrade, in-memory debug records, and debug record printing to bitcode and textual IR. We need to do this ahead of removing the experimental-debuginfo-iterators flag, to reduce the amount of test-juggling that happens at that time. There are quite a number of weird test behaviours related to this -- some of which I simply delete in this commit. Things like print-non-instruction-debug-info.ll , the test suite now checks for debug records in all tests, and we don't want to check we can print as intrinsics. Or the update_test_checks tests -- these are duplicated with write-experimental-debuginfo=false to ensure file writing for intrinsics is correct, but that's something we're imminently going to delete. A short survey of curious test changes: * free-intrinsics.ll: we don't need to test that debug-info is a zero cost intrinsic, because we won't be using intrinsics in the future. * undef-dbg-val.ll: apparently we pinned this to non-RemoveDIs in-memory mode while we sorted something out; it works now either way. * salvage-cast-debug-info.ll: was testing intrinsics-in-memory get salvaged, isn't necessary now * localize-constexpr-debuginfo.ll: was producing "dead metadata" intrinsics for optimised-out variable values, dbg-records takes the (correct) representation of poison/undef as an operand. Looks like we didn't update this in the past to avoid spurious test differences. * Transforms/Scalarizer/dbginfo.ll: this test was explicitly testing that debug-info affected codegen, and we deferred updating the tests until now. This is just one of those silent gnochange issues that get fixed by RemoveDIs. Finally: I've added a bitcode test, dbg-intrinsics-autoupgrade.ll.bc, that checks we can autoupgrade debug intrinsics that are in bitcode into the new debug records.	2025-04-01 14:27:11 +01:00
Akshat Oke	4a68702455	[CodeGen][NPM] Port XRayInstrumentation to NPM (#129865 )	2025-04-01 15:38:49 +05:30
Afanasyev Ivan	337bad3921	[EarlyIfConverter] Fix reg killed twice after early-if-predicator and ifcvt (#133554 ) Bug relates to `early-if-predicator` and `early-ifcvt` passes. If virtual register has "killed" flag in both basic blocks to be merged into head, both instructions in head basic block will have "killed" flag for this register. It makes MIR incorrect. Example: ``` bb.0: ; if ... %0:intregs = COPY $r0 J2_jumpf %2, %bb.2, implicit-def dead $pc J2_jump %bb.1, implicit-def dead $pc bb.1: ; if.then ... S4_storeiri_io killed %0, 0, 1 J2_jump %bb.3, implicit-def dead $pc bb.2: ; if.else ... S4_storeiri_io killed %0, 0, 1 J2_jump %bb.3, implicit-def dead $pc ``` After early-if-predicator will become: ``` bb.0: %0:intregs = COPY $r0 S4_storeirif_io %1, killed %0, 0, 1 S4_storeirit_io %1, killed %0, 0, 1 ``` Having `killed` flag set twice in bb.0 for `%0` is an incorrect MIR.	2025-04-01 12:06:30 +02:00
Fangrui Song	dd862356e2	AsmPrinter: Remove ELF's special lowerRelativeReference for unnamed_addr function https://reviews.llvm.org/D17938 introduced lowerRelativeReference to give ConstantExpr sub (A-B) special semantics in ELF: when `A` is an `unnamed_addr` function, create a PLT-generating relocation. This was intended for C++ relative vtables, but C++ relative vtable ended up using DSOLocalEquivalent (lowerDSOLocalEquivalent). This special treatment of `unnamed_addr` seems unusual. Let's remove it. Only COFF needs an overload to generate a @IMGREL32 relocation specifier (llvm/test/MC/COFF/cross-section-relative.ll). Pull Request: https://github.com/llvm/llvm-project/pull/132684	2025-03-31 20:44:29 -07:00
3405691582	c180e249d0	Fix crash lowering stack guard on OpenBSD/aarch64. (#125416 ) TargetLoweringBase::getIRStackGuard refers to a platform-specific guard variable. Before this change, TargetLoweringBase::getSDagStackGuard only referred to a different variable. This means that SelectionDAGBuilder's getLoadStackGuard does not get memory operands. However, AArch64InstrInfo::expandPostRAPseudo assumes that the passed MachineInstr has nonzero memoperands, causing a segfault. We have two possible options here: either disabling the LOAD_STACK_GUARD node entirely in AArch64TargetLowering::useLoadStackGuardNode or just making the platform-specific values match across TargetLoweringBase. Here, we try the latter.	2025-03-31 09:17:55 -07:00
Rahul Joshi	74b7abf154	[IRBuilder] Add new overload for CreateIntrinsic (#131942 ) Add a new `CreateIntrinsic` overload with no `Types`, useful for creating calls to non-overloaded intrinsics that don't need additional mangling.	2025-03-31 08:10:34 -07:00
Tom Tromey	68947342b7	Add support for fixed-point types (#129596 ) This adds DWARF generation for fixed-point types. This feature is needed by Ada. Note that a pre-existing GNU extension is used in one case. This has been emitted by GCC for years, and is needed because standard DWARF is otherwise incapable of representing these types.	2025-03-31 07:42:21 -07:00
Simon Pilgrim	9b32f3d096	[DAG] visitEXTRACT_SUBVECTOR - don't return early on failure of EXTRACT_SUBVECTOR(INSERT_SUBVECTOR()) -> BITCAST fold (#133695 ) Always allow later folds to try to match as well.	2025-03-31 14:32:43 +01:00
Liqiang TAO	1f7f268f30	StackProtector: use isInTailCallPosition to verify tail call position (#68997 ) The issue is caused by [D133860](https://reviews.llvm.org/D133860). The guard would be inserted in wrong place in some cases, like the test case showed below. This patch fixed the issue by using `isInTailCallPosition()` to verify whether the tail call is in right position.	2025-03-30 11:21:19 -07:00
Mingming Liu	9747bb182f	[CodeGen][StaticDataSplitter]Support constant pool partitioning (#129781 ) This is a follow-up patch of https://github.com/llvm/llvm-project/pull/125756 In this PR, static-data-splitter pass produces the aggregated profile counts of constants for constant pools in a global state (`StateDataProfileInfo`), and asm printer consumes the profile counts to produce `.hot` or `.unlikely` prefixes. This implementation covers both x86 and aarch64 asm printer.	2025-03-29 22:07:56 -07:00
Kazu Hirata	e3a3f78f35	[CodeGen] Use llvm::append_range (NFC) (#133603 )	2025-03-29 16:53:02 -07:00
Fangrui Song	fe6fb910df	[RISCV] Replace @plt/@gotpcrel in data directives with %pltpcrel %gotpcrel clang -fexperimental-relative-c++-abi-vtables might generate `@plt` and `@gotpcrel` specifiers in data directives. The syntax is not used in humand-written assembly code, and is not supported by GNU assembler. Note: the `@plt` in `.word foo@plt` is different from the legacy `call func@plt` (where `@plt` is simply ignored). The `@plt` syntax was selected was simply due to a quirk of AsmParser: the syntax was supported by all targets until I updated it to be an opt-in feature in a0671758eb6e52a758bd1b096a9b421eec60204c RISC-V favors the `%specifier(expr)` syntax following MIPS and Sparc, and we should follow this convention. This PR adds support for `.word %pltpcrel(foo+offset)` and `.word %gotpcrel(foo)`, and drops `@plt` and `@gotpcrel`. * MCValue::SymA can no longer have a SymbolVariant. Add an assert similar to that of AArch64ELFObjectWriter.cpp before https://reviews.llvm.org/D81446 (see my analysis at https://maskray.me/blog/2025-03-16-relocation-generation-in-assemblers if intrigued) * `jump foo@plt, x31` now has a different diagnostic. Pull Request: https://github.com/llvm/llvm-project/pull/132569	2025-03-29 11:08:13 -07:00
Simon Pilgrim	666faa7fd9	[DAG] visitEXTRACT_SUBVECTOR - accumulate SimplifyDemandedVectorElts demanded elts across all EXTRACT_SUBVECTOR uses (REAPPLIED) (#133401 ) Similar to what is done for visitEXTRACT_VECTOR_ELT - if all uses of a vector are EXTRACT_SUBVECTOR, then determine the accumulated demanded elts across all users and call SimplifyDemandedVectorElts in "AssumeSingleUse" use. Second try after #133130 was reverted by #133331 due to it affecting reverted test files	2025-03-29 17:55:38 +00:00
Tim Gymnich	1d0005a69a	[GlobalISel][NFC] Rename GISelKnownBits to GISelValueTracking (#133466 ) - rename `GISelKnownBits` to `GISelValueTracking` to analyze more than just `KnownBits` in the future	2025-03-29 11:51:29 +01:00
Kazu Hirata	f915015a3e	[llvm] Remove extraneous calls to make_range (NFC) (#133551 )	2025-03-28 19:56:02 -07:00
Kazu Hirata	d4427f308e	[llvm] Use range constructors of *Set (NFC) (#133549 )	2025-03-28 19:55:18 -07:00
Mingming Liu	c8a70f4c6e	[CodeGen][StaticDataPartitioning]Place local-linkage global variables in hot or unlikely prefixed sections based on profile information (#125756 ) In this PR, static-data-splitter pass finds out the local-linkage global variables in {`.rodata`, `.data.rel.ro`, `bss`, `.data`} sections by analyzing machine instruction operands, and aggregates their accesses from code across functions. A follow-up item is to analyze global variable initializers and count for access from data. * This limitation is demonstrated by `bss2` and `data3` in `llvm/test/CodeGen/X86/global-variable-partition.ll`. Some stats of static-data-splitter with this patch: section\|bss\|rodata\|data :-----:\|:-----:\|:-----:\|:-----: hot-prefixed section coverage\|99.75%\|97.71%\|91.30% unlikely-prefixed section size percentage\|67.94%\|39.37%\|63.10% 1. The coverage is defined as `#perf-sample-in-hot-prefixed <data> section / #perf-sample in <data.> section` for each <data> section. The perf command samples `MEM_INST_RETIRED.ALL_LOADS:u:pinned:precise=2` events at a high frequency (`perf -c 2251`) for 30 seconds. The profiled binary is built as non-PIE so `data.rel.ro` coverage data is not available. 2. The unlikely-prefixed `<data>` section size percentage is defined as `unlikely <data> section size / the sum size of <data>.* sections` for each `<data>` section	2025-03-28 16:31:46 -07:00
Kazu Hirata	673f4705a8	[llvm] Use Set::insert_range (NFC) (#133353 ) We can use Set::insert_range to collapse: for (auto Elem : Range) Set.insert(E.first); down to: Set.insert_range(llvm::make_first_range(Range)); In some cases, we can further fold that into the set declaration.	2025-03-27 20:44:20 -07:00
Walter Lee	5b7fd708fe	Revert "[DAG] visitEXTRACT_SUBVECTOR - accumulate SimplifyDemandedVectorElts demanded elts across all EXTRACT_SUBVECTOR uses" (#133331 ) Reverts llvm/llvm-project#133130 This touches a common file as #133083, which is causing failures	2025-03-27 18:36:38 -04:00
Philip Reames	c90a536bcf	[CodeGen] Simplify code using TypeSize overloads of getMachineMemOperand [nfc] These were added in d584cea. This change runs through existing uses and simplifies where obvious.	2025-03-27 11:47:51 -07:00
Simon Pilgrim	a8575b3ea8	[DAG] visitEXTRACT_SUBVECTOR - accumulate SimplifyDemandedVectorElts demanded elts across all EXTRACT_SUBVECTOR uses (#133130 ) Similar to what is done for visitEXTRACT_VECTOR_ELT - if all uses of a vector are EXTRACT_SUBVECTOR, then determine the accumulated demanded elts across all users and call SimplifyDemandedVectorElts in "AssumeSingleUse" use.	2025-03-27 15:31:06 +00:00
LU-JOHN	2df25a4733	Invalidate range metadata when folding bitcast into load (#133095 )	2025-03-27 14:10:55 +07:00
Philip Reames	79e82b6f14	[RISCV] Use a precise size for MMO on scalable spill and fill (#133171 ) The primary effect of this is that we get proper scalable sizes printed by the assembler, but this may also enable proper aliasing analysis. I don't see any test changes resulting from the later. Getting the size is slightly tricky as we store the scalable size as a non-scalable quantity in the object size field for the frame index. We really should remove that hack at some point... For the synthetic tuple spills and fills, I dropped the size from the split loads and stores to avoid incorrect (overly large) sizes. We could also divide by the NF factor if we felt like writing the code to do so.	2025-03-26 18:25:59 -07:00
Ethan Kaji	a629b50575	Port `NVPTXTargetLowering::LowerCONCAT_VECTORS` to SelectionDAG (#120030 ) Ports `NVPTXTargetLowering::LowerCONCAT_VECTORS` to `llvm/lib/CodeGen/SelectionDAG` as requested in https://github.com/llvm/llvm-project/issues/116695.	2025-03-27 07:40:35 +07:00
Craig Topper	6075275e68	[AsmPrinter] Don't pass Twine by value. NFC	2025-03-26 15:15:12 -07:00
Philip Reames	236f938ef6	[CodeGen] Provide a target independent default for optimizeLoadInst [NFC] This just moves the x86 implementation into generic code since it appears to be suitable for any target. The heart of this transform is inside foldMemoryOperand so other targets won't actually kick in until they implement said API. This just removes one piece to implement in the process of enabling foldMemoryOperand.	2025-03-26 08:52:40 -07:00
dianqk	66f158d918	[TailDuplicator] Determine if computed gotos using `blockaddress` (#132536 ) Using `blockaddress` should be more reliable than determining if an operand comes from a jump table index. Alternative: Add the `MachineInstr::MIFlag::ComputedGoto` flag when lowering `indirectbr`. But I don't think this approach is suitable to backport.	2025-03-26 21:27:43 +08:00
Tom Tromey	f89129af8a	Add bit stride to DICompositeType (#131680 ) In Ada, an array can be packed and the elements can take less space than their natural object size. For example, for this type: type Packed_Array is array (4 .. 8) of Boolean; pragma pack (Packed_Array); ... each element of the array occupies a single bit, even though the "natural" size for a Boolean in memory is a byte. In DWARF, this is represented by putting a DW_AT_bit_stride onto the array type itself. This patch adds a bit stride to DICompositeType so that gnat-llvm can emit DWARF for these sorts of arrays.	2025-03-25 17:14:07 -07:00
LU-JOHN	70aeb89094	Calculate KnownBits from Metadata correctly for vector loads (#128908 ) Calculate KnownBits correctly from metadata for vector loads. --------- Signed-off-by: John Lu <John.Lu@amd.com>	2025-03-25 22:46:30 +07:00
Jonathan Cohen	6785951410	[Machine-Combiner] Add a pass to reassociate chains of accumulation instructions into a tree (#132728 ) This pass is designed to increase ILP by performing accumulation into multiple registers. It currently supports only the S/UABAL accumulation instruction, but can be extended to support additional instructions. Reland of #126060 which was reverted due to a conflict with #131272.	2025-03-25 15:58:20 +02:00
Simon Pilgrim	0237216f16	[DAG] canCreateUndefOrPoison - add EXTRACT_SUBVECTOR handling (#132745 ) Similar to INSERT_SUBVECTOR - the index is constant and will be inbounds	2025-03-24 16:03:47 +00:00
Kazu Hirata	1904241a9e	[CodeGen] Avoid repeated hash lookups (NFC) (#132658 )	2025-03-24 07:46:35 -07:00
Pierre van Houtryve	c457c88951	[GlobalISel] Combine (sext (trunc x)) to (sext_inreg x) (#131622 ) Split from #131312	2025-03-24 09:32:04 +01:00
Pierre van Houtryve	6e3c24fc0a	[DAG] Combine (sext (sext_in_reg x)) to (sext_in_reg (any_extend x)) (#132386 )	2025-03-24 09:31:02 +01:00
Antonio Frighetto	ade2276517	[RegAllocFast] Ensure live-in vregs get reloaded after INLINEASM_BR spills We have already ensured in 9cec2b246e719533723562950e56c292fe5dd5ad that `INLINEASM_BR` output operands get spilled onto the stack, both in the fallthrough path and in the indirect targets. Since reloads of live-ins values into physical registers contextually happen after all MIR instructions (and ops) have been visited, make sure such loads are placed at the start of the block, but after prologues or `INLINEASM_BR` spills, as otherwise this may cause stale values to be read from the stack. Fixes: #74483, #110251.	2025-03-24 09:19:53 +01:00
Fangrui Song	7e6d008023	AsmPrinter: Remove unneeded lowerRelativeReference overrides The function is only called by AsmPrinter, where there is a fallback when lowerRelativeReference returns nullptr. wasm and XCOFF could use the fallback code. (lowerRelativeReference was introduced in 2016 (https://reviews.llvm.org/D17938) for C++ relative vtables, but C++ relative vtables ended up using dso_local_equivalent. llvm/test/MC/COFF/cross-section-relative.ll also uses this.)	2025-03-23 23:58:41 -07:00
Akshat Oke	174110bf3c	[CodeGen][NPM] Port LiveDebugValues to NPM (#131563 )	2025-03-24 11:34:45 +05:30
Kazu Hirata	1019457891	[CodeGen] Use Set::insert_range (NFC) (#132651 ) We can use Set::insert_range to collapse: for (auto Elem : Range) Set.insert(E); down to: Set.insert_range(Range);	2025-03-23 21:20:44 -07:00
Mingming Liu	3b20ac00f9	[NFC]Don't use else after a return (#132644 ) A trivial code clean-up per https://llvm.org/docs/CodingStandards.html#don-t-use-else-after-a-return	2025-03-23 18:34:52 -07:00
Kazu Hirata	41b76119ec	[llvm] Use range constructors for *Set (NFC) (#132636 )	2025-03-23 15:50:34 -07:00

1 2 3 4 5 ...

37511 Commits