llvm-project

Author	SHA1	Message	Date
3405691582	c180e249d0	Fix crash lowering stack guard on OpenBSD/aarch64. (#125416 ) TargetLoweringBase::getIRStackGuard refers to a platform-specific guard variable. Before this change, TargetLoweringBase::getSDagStackGuard only referred to a different variable. This means that SelectionDAGBuilder's getLoadStackGuard does not get memory operands. However, AArch64InstrInfo::expandPostRAPseudo assumes that the passed MachineInstr has nonzero memoperands, causing a segfault. We have two possible options here: either disabling the LOAD_STACK_GUARD node entirely in AArch64TargetLowering::useLoadStackGuardNode or just making the platform-specific values match across TargetLoweringBase. Here, we try the latter.	2025-03-31 09:17:55 -07:00
Rahul Joshi	74b7abf154	[IRBuilder] Add new overload for CreateIntrinsic (#131942 ) Add a new `CreateIntrinsic` overload with no `Types`, useful for creating calls to non-overloaded intrinsics that don't need additional mangling.	2025-03-31 08:10:34 -07:00
Tom Tromey	68947342b7	Add support for fixed-point types (#129596 ) This adds DWARF generation for fixed-point types. This feature is needed by Ada. Note that a pre-existing GNU extension is used in one case. This has been emitted by GCC for years, and is needed because standard DWARF is otherwise incapable of representing these types.	2025-03-31 07:42:21 -07:00
Simon Pilgrim	9b32f3d096	[DAG] visitEXTRACT_SUBVECTOR - don't return early on failure of EXTRACT_SUBVECTOR(INSERT_SUBVECTOR()) -> BITCAST fold (#133695 ) Always allow later folds to try to match as well.	2025-03-31 14:32:43 +01:00
Liqiang TAO	1f7f268f30	StackProtector: use isInTailCallPosition to verify tail call position (#68997 ) The issue is caused by [D133860](https://reviews.llvm.org/D133860). The guard would be inserted in wrong place in some cases, like the test case showed below. This patch fixed the issue by using `isInTailCallPosition()` to verify whether the tail call is in right position.	2025-03-30 11:21:19 -07:00
Mingming Liu	9747bb182f	[CodeGen][StaticDataSplitter]Support constant pool partitioning (#129781 ) This is a follow-up patch of https://github.com/llvm/llvm-project/pull/125756 In this PR, static-data-splitter pass produces the aggregated profile counts of constants for constant pools in a global state (`StateDataProfileInfo`), and asm printer consumes the profile counts to produce `.hot` or `.unlikely` prefixes. This implementation covers both x86 and aarch64 asm printer.	2025-03-29 22:07:56 -07:00
Kazu Hirata	e3a3f78f35	[CodeGen] Use llvm::append_range (NFC) (#133603 )	2025-03-29 16:53:02 -07:00
Fangrui Song	fe6fb910df	[RISCV] Replace @plt/@gotpcrel in data directives with %pltpcrel %gotpcrel clang -fexperimental-relative-c++-abi-vtables might generate `@plt` and `@gotpcrel` specifiers in data directives. The syntax is not used in humand-written assembly code, and is not supported by GNU assembler. Note: the `@plt` in `.word foo@plt` is different from the legacy `call func@plt` (where `@plt` is simply ignored). The `@plt` syntax was selected was simply due to a quirk of AsmParser: the syntax was supported by all targets until I updated it to be an opt-in feature in a0671758eb6e52a758bd1b096a9b421eec60204c RISC-V favors the `%specifier(expr)` syntax following MIPS and Sparc, and we should follow this convention. This PR adds support for `.word %pltpcrel(foo+offset)` and `.word %gotpcrel(foo)`, and drops `@plt` and `@gotpcrel`. * MCValue::SymA can no longer have a SymbolVariant. Add an assert similar to that of AArch64ELFObjectWriter.cpp before https://reviews.llvm.org/D81446 (see my analysis at https://maskray.me/blog/2025-03-16-relocation-generation-in-assemblers if intrigued) * `jump foo@plt, x31` now has a different diagnostic. Pull Request: https://github.com/llvm/llvm-project/pull/132569	2025-03-29 11:08:13 -07:00
Simon Pilgrim	666faa7fd9	[DAG] visitEXTRACT_SUBVECTOR - accumulate SimplifyDemandedVectorElts demanded elts across all EXTRACT_SUBVECTOR uses (REAPPLIED) (#133401 ) Similar to what is done for visitEXTRACT_VECTOR_ELT - if all uses of a vector are EXTRACT_SUBVECTOR, then determine the accumulated demanded elts across all users and call SimplifyDemandedVectorElts in "AssumeSingleUse" use. Second try after #133130 was reverted by #133331 due to it affecting reverted test files	2025-03-29 17:55:38 +00:00
Tim Gymnich	1d0005a69a	[GlobalISel][NFC] Rename GISelKnownBits to GISelValueTracking (#133466 ) - rename `GISelKnownBits` to `GISelValueTracking` to analyze more than just `KnownBits` in the future	2025-03-29 11:51:29 +01:00
Kazu Hirata	f915015a3e	[llvm] Remove extraneous calls to make_range (NFC) (#133551 )	2025-03-28 19:56:02 -07:00
Kazu Hirata	d4427f308e	[llvm] Use range constructors of *Set (NFC) (#133549 )	2025-03-28 19:55:18 -07:00
Mingming Liu	c8a70f4c6e	[CodeGen][StaticDataPartitioning]Place local-linkage global variables in hot or unlikely prefixed sections based on profile information (#125756 ) In this PR, static-data-splitter pass finds out the local-linkage global variables in {`.rodata`, `.data.rel.ro`, `bss`, `.data`} sections by analyzing machine instruction operands, and aggregates their accesses from code across functions. A follow-up item is to analyze global variable initializers and count for access from data. * This limitation is demonstrated by `bss2` and `data3` in `llvm/test/CodeGen/X86/global-variable-partition.ll`. Some stats of static-data-splitter with this patch: section\|bss\|rodata\|data :-----:\|:-----:\|:-----:\|:-----: hot-prefixed section coverage\|99.75%\|97.71%\|91.30% unlikely-prefixed section size percentage\|67.94%\|39.37%\|63.10% 1. The coverage is defined as `#perf-sample-in-hot-prefixed <data> section / #perf-sample in <data.> section` for each <data> section. The perf command samples `MEM_INST_RETIRED.ALL_LOADS:u:pinned:precise=2` events at a high frequency (`perf -c 2251`) for 30 seconds. The profiled binary is built as non-PIE so `data.rel.ro` coverage data is not available. 2. The unlikely-prefixed `<data>` section size percentage is defined as `unlikely <data> section size / the sum size of <data>.* sections` for each `<data>` section	2025-03-28 16:31:46 -07:00
Kazu Hirata	673f4705a8	[llvm] Use Set::insert_range (NFC) (#133353 ) We can use Set::insert_range to collapse: for (auto Elem : Range) Set.insert(E.first); down to: Set.insert_range(llvm::make_first_range(Range)); In some cases, we can further fold that into the set declaration.	2025-03-27 20:44:20 -07:00
Walter Lee	5b7fd708fe	Revert "[DAG] visitEXTRACT_SUBVECTOR - accumulate SimplifyDemandedVectorElts demanded elts across all EXTRACT_SUBVECTOR uses" (#133331 ) Reverts llvm/llvm-project#133130 This touches a common file as #133083, which is causing failures	2025-03-27 18:36:38 -04:00
Philip Reames	c90a536bcf	[CodeGen] Simplify code using TypeSize overloads of getMachineMemOperand [nfc] These were added in d584cea. This change runs through existing uses and simplifies where obvious.	2025-03-27 11:47:51 -07:00
Simon Pilgrim	a8575b3ea8	[DAG] visitEXTRACT_SUBVECTOR - accumulate SimplifyDemandedVectorElts demanded elts across all EXTRACT_SUBVECTOR uses (#133130 ) Similar to what is done for visitEXTRACT_VECTOR_ELT - if all uses of a vector are EXTRACT_SUBVECTOR, then determine the accumulated demanded elts across all users and call SimplifyDemandedVectorElts in "AssumeSingleUse" use.	2025-03-27 15:31:06 +00:00
LU-JOHN	2df25a4733	Invalidate range metadata when folding bitcast into load (#133095 )	2025-03-27 14:10:55 +07:00
Philip Reames	79e82b6f14	[RISCV] Use a precise size for MMO on scalable spill and fill (#133171 ) The primary effect of this is that we get proper scalable sizes printed by the assembler, but this may also enable proper aliasing analysis. I don't see any test changes resulting from the later. Getting the size is slightly tricky as we store the scalable size as a non-scalable quantity in the object size field for the frame index. We really should remove that hack at some point... For the synthetic tuple spills and fills, I dropped the size from the split loads and stores to avoid incorrect (overly large) sizes. We could also divide by the NF factor if we felt like writing the code to do so.	2025-03-26 18:25:59 -07:00
Ethan Kaji	a629b50575	Port `NVPTXTargetLowering::LowerCONCAT_VECTORS` to SelectionDAG (#120030 ) Ports `NVPTXTargetLowering::LowerCONCAT_VECTORS` to `llvm/lib/CodeGen/SelectionDAG` as requested in https://github.com/llvm/llvm-project/issues/116695.	2025-03-27 07:40:35 +07:00
Craig Topper	6075275e68	[AsmPrinter] Don't pass Twine by value. NFC	2025-03-26 15:15:12 -07:00
Philip Reames	236f938ef6	[CodeGen] Provide a target independent default for optimizeLoadInst [NFC] This just moves the x86 implementation into generic code since it appears to be suitable for any target. The heart of this transform is inside foldMemoryOperand so other targets won't actually kick in until they implement said API. This just removes one piece to implement in the process of enabling foldMemoryOperand.	2025-03-26 08:52:40 -07:00
dianqk	66f158d918	[TailDuplicator] Determine if computed gotos using `blockaddress` (#132536 ) Using `blockaddress` should be more reliable than determining if an operand comes from a jump table index. Alternative: Add the `MachineInstr::MIFlag::ComputedGoto` flag when lowering `indirectbr`. But I don't think this approach is suitable to backport.	2025-03-26 21:27:43 +08:00
Tom Tromey	f89129af8a	Add bit stride to DICompositeType (#131680 ) In Ada, an array can be packed and the elements can take less space than their natural object size. For example, for this type: type Packed_Array is array (4 .. 8) of Boolean; pragma pack (Packed_Array); ... each element of the array occupies a single bit, even though the "natural" size for a Boolean in memory is a byte. In DWARF, this is represented by putting a DW_AT_bit_stride onto the array type itself. This patch adds a bit stride to DICompositeType so that gnat-llvm can emit DWARF for these sorts of arrays.	2025-03-25 17:14:07 -07:00
LU-JOHN	70aeb89094	Calculate KnownBits from Metadata correctly for vector loads (#128908 ) Calculate KnownBits correctly from metadata for vector loads. --------- Signed-off-by: John Lu <John.Lu@amd.com>	2025-03-25 22:46:30 +07:00
Jonathan Cohen	6785951410	[Machine-Combiner] Add a pass to reassociate chains of accumulation instructions into a tree (#132728 ) This pass is designed to increase ILP by performing accumulation into multiple registers. It currently supports only the S/UABAL accumulation instruction, but can be extended to support additional instructions. Reland of #126060 which was reverted due to a conflict with #131272.	2025-03-25 15:58:20 +02:00
Simon Pilgrim	0237216f16	[DAG] canCreateUndefOrPoison - add EXTRACT_SUBVECTOR handling (#132745 ) Similar to INSERT_SUBVECTOR - the index is constant and will be inbounds	2025-03-24 16:03:47 +00:00
Kazu Hirata	1904241a9e	[CodeGen] Avoid repeated hash lookups (NFC) (#132658 )	2025-03-24 07:46:35 -07:00
Pierre van Houtryve	c457c88951	[GlobalISel] Combine (sext (trunc x)) to (sext_inreg x) (#131622 ) Split from #131312	2025-03-24 09:32:04 +01:00
Pierre van Houtryve	6e3c24fc0a	[DAG] Combine (sext (sext_in_reg x)) to (sext_in_reg (any_extend x)) (#132386 )	2025-03-24 09:31:02 +01:00
Antonio Frighetto	ade2276517	[RegAllocFast] Ensure live-in vregs get reloaded after INLINEASM_BR spills We have already ensured in 9cec2b246e719533723562950e56c292fe5dd5ad that `INLINEASM_BR` output operands get spilled onto the stack, both in the fallthrough path and in the indirect targets. Since reloads of live-ins values into physical registers contextually happen after all MIR instructions (and ops) have been visited, make sure such loads are placed at the start of the block, but after prologues or `INLINEASM_BR` spills, as otherwise this may cause stale values to be read from the stack. Fixes: #74483, #110251.	2025-03-24 09:19:53 +01:00
Fangrui Song	7e6d008023	AsmPrinter: Remove unneeded lowerRelativeReference overrides The function is only called by AsmPrinter, where there is a fallback when lowerRelativeReference returns nullptr. wasm and XCOFF could use the fallback code. (lowerRelativeReference was introduced in 2016 (https://reviews.llvm.org/D17938) for C++ relative vtables, but C++ relative vtables ended up using dso_local_equivalent. llvm/test/MC/COFF/cross-section-relative.ll also uses this.)	2025-03-23 23:58:41 -07:00
Akshat Oke	174110bf3c	[CodeGen][NPM] Port LiveDebugValues to NPM (#131563 )	2025-03-24 11:34:45 +05:30
Kazu Hirata	1019457891	[CodeGen] Use Set::insert_range (NFC) (#132651 ) We can use Set::insert_range to collapse: for (auto Elem : Range) Set.insert(E); down to: Set.insert_range(Range);	2025-03-23 21:20:44 -07:00
Mingming Liu	3b20ac00f9	[NFC]Don't use else after a return (#132644 ) A trivial code clean-up per https://llvm.org/docs/CodingStandards.html#don-t-use-else-after-a-return	2025-03-23 18:34:52 -07:00
Kazu Hirata	41b76119ec	[llvm] Use range constructors for *Set (NFC) (#132636 )	2025-03-23 15:50:34 -07:00
Fangrui Song	dfae1f968e	MCValue: Simplify code with getSubSym	2025-03-23 12:22:44 -07:00
Fangrui Song	b73e144bdf	MCValue: Simplify code with getSubSym MCValue::SymB is a MCSymbolRefExpr , which might become MCSymbol in the future. Simplify some code that uses MCValue::SymB.	2025-03-23 12:13:13 -07:00
Jonathan Cohen	7bda9caa49	Revert "[AArch64][MachineCombiner] Recombine long chains of accumulation instructions into a tree to increase ILP (#126060 ) (#132607 ) This reverts commit c4caf949aa934a219e84d4ba0530bd535e698cdb.	2025-03-23 13:58:00 +02:00
Jonathan Cohen	c4caf949aa	[AArch64][MachineCombiner] Recombine long chains of accumulation instructions into a tree to increase ILP (#126060 ) This pattern shows up often in media libraries. The optimization should only kick in for O3. Currently only supports a single family of accumulation instructions, but can easily be expanded to support additional instructions in the future.	2025-03-23 13:25:35 +02:00
Kazu Hirata	f3e8e80563	[llvm] Construct SmallVector with ArrayRef (NFC) (#132560 )	2025-03-22 13:11:31 -07:00
Kazu Hirata	cb729be11c	[CodeGen] Avoid repeated hash lookups (NFC) (#132513 )	2025-03-22 08:08:28 -07:00
Kazu Hirata	1b189cab5e	[llvm] Use *Set::insert_range (NFC) (#132509 ) DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently gained C++23-style insert_range. This patch uses insert_range in conjunction with llvm::{predecessors,successors} and MachineBasicBlock::{predecessors,successors}.	2025-03-22 08:07:33 -07:00
Mikhail R. Gadelha	f138e36d52	[SelectionDAG][RISCV] Avoid store merging across function calls (#130430 ) This patch improves DAGCombiner's handling of potential store merges by detecting function calls between loads and stores. When a function call exists in the chain between a load and its corresponding store, we avoid merging these stores if the spilling is unprofitable. We had to implement a hook on TLI, since TTI is unavailable in DAGCombine. Currently, it's only enabled for riscv. This is the DAG equivalent of PR #129258	2025-03-22 10:35:25 -03:00
Fangrui Song	4b417992dd	[CodeGen] Rename PLTRelativeVariantKind. NFC Migrate away from the deprecated MCSymbolRefExpr::VariantKind. The name "Specifier" is utilized in a few *MCExpr. > "Relocation specifier" is clear, aligns with Arm and IBM AIX's documentation, and fits the assembler's role seamlessly.	2025-03-21 23:02:08 -07:00
pzzp	d6a2cca77e	[llvm:ir] Add support for constant data exceeding 4GiB (#126481 ) The test file is over 4GiB, which is too big, so I didn’t submit it.	2025-03-21 11:44:01 -07:00
Tony Varghese	ff9c5c334a	[shrinkwrap] PowerPC's FP register should be honored when processing the save point for prologue. (#129855 ) When generating code for functions that have `__builtin_frame_address` calls and `noinline` attribute, prologue was not emitted correctly leading to an assertion failure in PowerPC. The issue was due to improper insertion of prologue for a function that contain llvm `__builtin_frame_address`. Shrink-wrap pass computes the save and restore points of a function. Default points are the entry and exit points of the function. During shrink-wrapping the frame-pointer was not honored like the stack pointer and it was considered as a callee-saved register. This change will treat the FP similar to SP and will insert the prolog on top the instruction containing FP. --------- Co-authored-by: Tony Varghese <tony.varghese@ibm.com>	2025-03-21 12:55:39 -04:00
Kazu Hirata	67a631b406	[CodeGen] Avoid repeated hash lookups (NFC) (#132329 )	2025-03-21 08:00:45 -07:00
Ryotaro Kasuga	857a04cd76	[MachinePipeliner] Fix incorrect handlings of unpipelineable insts (#126057 ) There was a case where `normalizeNonPipelinedInstructions` didn't schedule unpipelineable instructions correctly, which could generate illegal code. This patch fixes this issue by rejecting the schedule if we fail to insert the unpipelineable instructions at stage 0. Here is a part of the debug output for `sms-unpipeline-insts3.mir` before applying this patch. ``` SU(0): %27:gpr32 = PHI %21:gpr32all, %bb.3, %28:gpr32all, %bb.4 Successors: SU(14): Data Latency=0 Reg=%27 SU(15): Anti Latency=1 ... SU(14): %41:gpr32 = ADDWrr %27:gpr32, %12:gpr32common Predecessors: SU(0): Data Latency=0 Reg=%27 SU(16): Ord Latency=0 Artificial Successors: SU(15): Data Latency=1 Reg=%41 SU(15): %28:gpr32all = COPY %41:gpr32 Predecessors: SU(14): Data Latency=1 Reg=%41 SU(0): Anti Latency=1 SU(16): %30:ppr = WHILELO_PWW_S %27:gpr32, %15:gpr32, implicit-def $nzcv Predecessors: SU(0): Data Latency=0 Reg=%27 Successors: SU(14): Ord Latency=0 Artificial ... Do not pipeline SU(16) Do not pipeline SU(1) Do not pipeline SU(0) Do not pipeline SU(15) Do not pipeline SU(14) SU(0) is not pipelined; moving from cycle 19 to 0 Instr: ... SU(1) is not pipelined; moving from cycle 10 to 0 Instr: ... SU(15) is not pipelined; moving from cycle 28 to 19 Instr: ... SU(16) is not pipelined; moving from cycle 19 to 0 Instr: ... Schedule Found? 1 (II=10) ... cycle 9 (1) (14) %41:gpr32 = ADDWrr %27:gpr32, %12:gpr32common cycle 9 (1) (15) %28:gpr32all = COPY %41:gpr32 ``` The SUs are traversed in the order of the original basic block, so in this case a new cycle of each instruction is determined in the order of `SU(0)`, `SU(1)`, `SU(14)`, `SU(15)`, `SU(16)`. Since there is an artificial dependence from `SU(16)` to `SU(14)`, which is contradict to the original SU order, the new cycle of `SU(14)` must be greater than or equal to the cycle of `SU(16)` at that time. This results in the failure of scheduling `SU(14)` at stage 0. For now, we reject the schedule for such cases.	2025-03-21 23:07:41 +09:00
Kazu Hirata	599005686a	[llvm] Use *Set::insert_range (NFC) (#132325 ) DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently gained C++23-style insert_range. This patch replaces: Dest.insert(Src.begin(), Src.end()); with: Dest.insert_range(Src); This patch does not touch custom begin like succ_begin for now.	2025-03-20 22:24:06 -07:00

1 2 3 4 5 ...

37497 Commits