llvm-project

Author	SHA1	Message	Date
Simon Pilgrim	0237216f16	[DAG] canCreateUndefOrPoison - add EXTRACT_SUBVECTOR handling (#132745 ) Similar to INSERT_SUBVECTOR - the index is constant and will be inbounds	2025-03-24 16:03:47 +00:00
Kazu Hirata	1904241a9e	[CodeGen] Avoid repeated hash lookups (NFC) (#132658 )	2025-03-24 07:46:35 -07:00
Pierre van Houtryve	c457c88951	[GlobalISel] Combine (sext (trunc x)) to (sext_inreg x) (#131622 ) Split from #131312	2025-03-24 09:32:04 +01:00
Pierre van Houtryve	6e3c24fc0a	[DAG] Combine (sext (sext_in_reg x)) to (sext_in_reg (any_extend x)) (#132386 )	2025-03-24 09:31:02 +01:00
Antonio Frighetto	ade2276517	[RegAllocFast] Ensure live-in vregs get reloaded after INLINEASM_BR spills We have already ensured in 9cec2b246e719533723562950e56c292fe5dd5ad that `INLINEASM_BR` output operands get spilled onto the stack, both in the fallthrough path and in the indirect targets. Since reloads of live-ins values into physical registers contextually happen after all MIR instructions (and ops) have been visited, make sure such loads are placed at the start of the block, but after prologues or `INLINEASM_BR` spills, as otherwise this may cause stale values to be read from the stack. Fixes: #74483, #110251.	2025-03-24 09:19:53 +01:00
Fangrui Song	7e6d008023	AsmPrinter: Remove unneeded lowerRelativeReference overrides The function is only called by AsmPrinter, where there is a fallback when lowerRelativeReference returns nullptr. wasm and XCOFF could use the fallback code. (lowerRelativeReference was introduced in 2016 (https://reviews.llvm.org/D17938) for C++ relative vtables, but C++ relative vtables ended up using dso_local_equivalent. llvm/test/MC/COFF/cross-section-relative.ll also uses this.)	2025-03-23 23:58:41 -07:00
Akshat Oke	174110bf3c	[CodeGen][NPM] Port LiveDebugValues to NPM (#131563 )	2025-03-24 11:34:45 +05:30
Kazu Hirata	1019457891	[CodeGen] Use Set::insert_range (NFC) (#132651 ) We can use Set::insert_range to collapse: for (auto Elem : Range) Set.insert(E); down to: Set.insert_range(Range);	2025-03-23 21:20:44 -07:00
Mingming Liu	3b20ac00f9	[NFC]Don't use else after a return (#132644 ) A trivial code clean-up per https://llvm.org/docs/CodingStandards.html#don-t-use-else-after-a-return	2025-03-23 18:34:52 -07:00
Kazu Hirata	41b76119ec	[llvm] Use range constructors for *Set (NFC) (#132636 )	2025-03-23 15:50:34 -07:00
Fangrui Song	dfae1f968e	MCValue: Simplify code with getSubSym	2025-03-23 12:22:44 -07:00
Fangrui Song	b73e144bdf	MCValue: Simplify code with getSubSym MCValue::SymB is a MCSymbolRefExpr , which might become MCSymbol in the future. Simplify some code that uses MCValue::SymB.	2025-03-23 12:13:13 -07:00
Jonathan Cohen	7bda9caa49	Revert "[AArch64][MachineCombiner] Recombine long chains of accumulation instructions into a tree to increase ILP (#126060 ) (#132607 ) This reverts commit c4caf949aa934a219e84d4ba0530bd535e698cdb.	2025-03-23 13:58:00 +02:00
Jonathan Cohen	c4caf949aa	[AArch64][MachineCombiner] Recombine long chains of accumulation instructions into a tree to increase ILP (#126060 ) This pattern shows up often in media libraries. The optimization should only kick in for O3. Currently only supports a single family of accumulation instructions, but can easily be expanded to support additional instructions in the future.	2025-03-23 13:25:35 +02:00
Kazu Hirata	f3e8e80563	[llvm] Construct SmallVector with ArrayRef (NFC) (#132560 )	2025-03-22 13:11:31 -07:00
Kazu Hirata	cb729be11c	[CodeGen] Avoid repeated hash lookups (NFC) (#132513 )	2025-03-22 08:08:28 -07:00
Kazu Hirata	1b189cab5e	[llvm] Use *Set::insert_range (NFC) (#132509 ) DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently gained C++23-style insert_range. This patch uses insert_range in conjunction with llvm::{predecessors,successors} and MachineBasicBlock::{predecessors,successors}.	2025-03-22 08:07:33 -07:00
Mikhail R. Gadelha	f138e36d52	[SelectionDAG][RISCV] Avoid store merging across function calls (#130430 ) This patch improves DAGCombiner's handling of potential store merges by detecting function calls between loads and stores. When a function call exists in the chain between a load and its corresponding store, we avoid merging these stores if the spilling is unprofitable. We had to implement a hook on TLI, since TTI is unavailable in DAGCombine. Currently, it's only enabled for riscv. This is the DAG equivalent of PR #129258	2025-03-22 10:35:25 -03:00
Fangrui Song	4b417992dd	[CodeGen] Rename PLTRelativeVariantKind. NFC Migrate away from the deprecated MCSymbolRefExpr::VariantKind. The name "Specifier" is utilized in a few *MCExpr. > "Relocation specifier" is clear, aligns with Arm and IBM AIX's documentation, and fits the assembler's role seamlessly.	2025-03-21 23:02:08 -07:00
pzzp	d6a2cca77e	[llvm:ir] Add support for constant data exceeding 4GiB (#126481 ) The test file is over 4GiB, which is too big, so I didn’t submit it.	2025-03-21 11:44:01 -07:00
Tony Varghese	ff9c5c334a	[shrinkwrap] PowerPC's FP register should be honored when processing the save point for prologue. (#129855 ) When generating code for functions that have `__builtin_frame_address` calls and `noinline` attribute, prologue was not emitted correctly leading to an assertion failure in PowerPC. The issue was due to improper insertion of prologue for a function that contain llvm `__builtin_frame_address`. Shrink-wrap pass computes the save and restore points of a function. Default points are the entry and exit points of the function. During shrink-wrapping the frame-pointer was not honored like the stack pointer and it was considered as a callee-saved register. This change will treat the FP similar to SP and will insert the prolog on top the instruction containing FP. --------- Co-authored-by: Tony Varghese <tony.varghese@ibm.com>	2025-03-21 12:55:39 -04:00
Kazu Hirata	67a631b406	[CodeGen] Avoid repeated hash lookups (NFC) (#132329 )	2025-03-21 08:00:45 -07:00
Ryotaro Kasuga	857a04cd76	[MachinePipeliner] Fix incorrect handlings of unpipelineable insts (#126057 ) There was a case where `normalizeNonPipelinedInstructions` didn't schedule unpipelineable instructions correctly, which could generate illegal code. This patch fixes this issue by rejecting the schedule if we fail to insert the unpipelineable instructions at stage 0. Here is a part of the debug output for `sms-unpipeline-insts3.mir` before applying this patch. ``` SU(0): %27:gpr32 = PHI %21:gpr32all, %bb.3, %28:gpr32all, %bb.4 Successors: SU(14): Data Latency=0 Reg=%27 SU(15): Anti Latency=1 ... SU(14): %41:gpr32 = ADDWrr %27:gpr32, %12:gpr32common Predecessors: SU(0): Data Latency=0 Reg=%27 SU(16): Ord Latency=0 Artificial Successors: SU(15): Data Latency=1 Reg=%41 SU(15): %28:gpr32all = COPY %41:gpr32 Predecessors: SU(14): Data Latency=1 Reg=%41 SU(0): Anti Latency=1 SU(16): %30:ppr = WHILELO_PWW_S %27:gpr32, %15:gpr32, implicit-def $nzcv Predecessors: SU(0): Data Latency=0 Reg=%27 Successors: SU(14): Ord Latency=0 Artificial ... Do not pipeline SU(16) Do not pipeline SU(1) Do not pipeline SU(0) Do not pipeline SU(15) Do not pipeline SU(14) SU(0) is not pipelined; moving from cycle 19 to 0 Instr: ... SU(1) is not pipelined; moving from cycle 10 to 0 Instr: ... SU(15) is not pipelined; moving from cycle 28 to 19 Instr: ... SU(16) is not pipelined; moving from cycle 19 to 0 Instr: ... Schedule Found? 1 (II=10) ... cycle 9 (1) (14) %41:gpr32 = ADDWrr %27:gpr32, %12:gpr32common cycle 9 (1) (15) %28:gpr32all = COPY %41:gpr32 ``` The SUs are traversed in the order of the original basic block, so in this case a new cycle of each instruction is determined in the order of `SU(0)`, `SU(1)`, `SU(14)`, `SU(15)`, `SU(16)`. Since there is an artificial dependence from `SU(16)` to `SU(14)`, which is contradict to the original SU order, the new cycle of `SU(14)` must be greater than or equal to the cycle of `SU(16)` at that time. This results in the failure of scheduling `SU(14)` at stage 0. For now, we reject the schedule for such cases.	2025-03-21 23:07:41 +09:00
Kazu Hirata	599005686a	[llvm] Use *Set::insert_range (NFC) (#132325 ) DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently gained C++23-style insert_range. This patch replaces: Dest.insert(Src.begin(), Src.end()); with: Dest.insert_range(Src); This patch does not touch custom begin like succ_begin for now.	2025-03-20 22:24:06 -07:00
yonghong-song	0ffe83feac	[SelectionDAG] Not issue TRAP node if naked function (#132147 ) In [1], Nikita Popov suggested that during lowering 'unreachable' insn should not generate extra code for naked functions, and this applies to all architectures. Note that for naked functions, 'unreachable' insn is necessary in IR since the basic block needs a terminator to end. This patch checked whether a function is naked function or not. If it is a naked function, 'unreachable' insn will not generate ISD::TRAP. [1] https://github.com/llvm/llvm-project/pull/131731 Co-authored-by: Yonghong Song <yonghong.song@linux.dev>	2025-03-20 18:18:03 -07:00
Michael Maitland	00fabd21bc	[RISCV][RegAlloc] Add getCSRFirstUseCost for RISC-V (#131349 ) This is based off of 63efd8e7e68bc. The following table shows the percent change to the dynamic instruction count when the function in this patch returns 0 (default) versus other values. \| benchmark \| % speedup 1 over 0 \| % speedup 4 over 0 \| % speedup 16 over 0 \| % speedup 64 over 0 \| % speedup 128 over 0 \| \| --------------- \| ---------------------- \| --------------------- \| --------------------- \| -------------------- \| -------------------- \| \| 500.perlbench_r \| 0.001018570165 \| 0.001049508358 \| 0.001001106529 \| 0.03382582818 \| 0.03395354577 \| \| 502.gcc_r \| 0.02850551412 \| 0.02170512371 \| 0.01453021263 \| 0.06011008637 \| 0.1215691521 \| \| 505.mcf_r \| -0.00009506373338 \| -0.00009090057642 \| -0.0000860991497 \| -0.00005027849766 \| 0.00001251173791 \| \| 520.omnetpp_r \| 0.2958940288 \| 0.2959715925 \| 0.2961141505 \| 0.2959823497 \| 0.2963124341 \| \| 523.xalancbmk_r \| -0.0327074721 \| -0.01037021046 \| -0.3226810542 \| 0.02127133714 \| 0.02765388389 \| \| 525.x264_r \| 0.0000001381714403 \| -0.00000007041540345 \| -0.00000002156399465 \| 0.0000002108993364 \| 0.0000002463382874 \| \| 531.deepsjeng_r \| 0.00000000339777238 \| 0.000000003874652714 \| 0.000000003636212547 \| 0.000000003874652714 \| 0.000000003159332213 \| \| 541.leela_r \| 0.0009186059953 \| -0.000424159199 \| 0.0004984456879 \| 0.274948447 \| 0.8135521414 \| \| 557.xz_r \| -0.000000003547118854 \| -0.00004896449559 \| -0.00004910691576 \| -0.0000491109983 \| -0.00004895599589 \| \| geomean \| 0.03265937388 \| 0.03424232324 \| -0.00107917442 \| 0.07629116165 \| 0.1439913192 \| The following table shows the percent change to the runtime when the function in this patch returns 0 (default) versus other values. \| benchmark \| % speedup 1 over 0 \| % speedup 4 over 0 \| % speedup 16 over 0 \| % speedup 64 over 0 \| %speedup 128 over 0 \| \| --------------- \| ------------------ \| ------------------ \| ------------------- \| ------------------- \| ------------------- \| \| 500.perlbench_r \| 0.1722356761 \| 0.2269681109 \| 0.2596825578 \| 0.361573851 \| 1.15041305 \| \| 502.gcc_r \| -0.548415855 \| -0.06187002799 \| -0.5553684674 \| -0.8876686237 \| -0.4668665535 \| \| 505.mcf_r \| -0.8786414258 \| -0.4150938441 \| -1.035517726 \| -0.1860770377 \| -0.01904825648 \| \| 520.omnetpp_r \| 0.4130256072 \| 0.6595976188 \| 0.897332171 \| 0.6252625622 \| 0.3869467278 \| \| 523.xalancbmk_r \| 1.318132014 \| -0.003927574 \| 1.025962975 \| 1.090320253 \| -0.789206202 \| \| 525.x264_r \| -0.03112871796 \| -0.00167557587 \| 0.06932423155 \| -0.1919840015 \| -0.1203585732 \| \| 531.deepsjeng_r \| -0.259516072 \| -0.01973455652 \| -0.2723227894 \| -0.005417022257 \| -0.02222388177 \| \| 541.leela_r \| -0.3497178495 \| -0.3510447393 \| 0.1274508001 \| 0.6485542452 \| 0.2880651727 \| \| 557.xz_r \| 0.7683565263 \| -0.2197509447 \| -0.0431183874 \| 0.07518130872 \| 0.5236853039 \| \| geomean \| 0.06506952742 \| -0.0211865386 \| 0.05072694648 \| 0.1684530637 \| 0.1020533557 \| I chose to set the value to 5 on RISC-V because it has improvement to both the dynamic IC and the runtime and because it showed good results empirically and had a similar effect as setting it to higher numbers. I looked at some diff and it seems like this patch leads to two things: 1. Less spilling -- not spilling the CSR led to better register allocation and helped us avoid spills down the line 2. Avoid spilling CSR but spill more on paths that static heuristics estimate as cold.	2025-03-20 15:20:04 -04:00
Philip Reames	4d4d9d5d33	[TTI] Use TypeSize in isLoadFromStackSlot and isStoreToStackSlot [nfc] (#132244 ) Motivation is supporting scalable spills and reloads, e.g. in https://github.com/llvm/llvm-project/pull/120524. Looking at this API, I'm suspicious that the access size should just be coming from the memory operand on the load or store, but we don't appear to be consistently setting that up. That's a larger change so I may or may not bother pursuing that.	2025-03-20 10:17:36 -07:00
Phoebe Wang	64555e3d48	[X86][NFCI] Add IsStore parameter to hasConditionalLoadStoreForType (#132153 ) Address https://github.com/llvm/llvm-project/pull/132032#issuecomment-2736936769	2025-03-20 18:25:09 +08:00
Nikita Popov	0738f70615	[Intrinsics] Add Intrinsic::getFnAttributes() (NFC) (#132029 ) Most places that call Intrinsic::getAttributes() are only interested in the function attributes, so add a separate function for that. The motivation for this is that I'd like to add the ability to specify range attributes on intrinsics, which requires knowing the function type. This avoids needing to know the type for most attribute queries.	2025-03-20 09:20:39 +01:00
Diana Picus	e17b3cdfb3	[AMDGPU] Dynamic VGPR support for llvm.amdgcn.cs.chain (#130094 ) The llvm.amdgcn.cs.chain intrinsic has a 'flags' operand which may indicate that we want to reallocate the VGPRs before performing the call. A call with the following arguments: ``` llvm.amdgcn.cs.chain %callee, %exec, %sgpr_args, %vgpr_args, /flags/0x1, %num_vgprs, %fallback_exec, %fallback_callee ``` is supposed to do the following: - copy the SGPR and VGPR args into their respective registers - try to change the VGPR allocation - if the allocation has succeeded, set EXEC to %exec and jump to %callee, otherwise set EXEC to %fallback_exec and jump to %fallback_callee This patch implements the dynamic VGPR behaviour by generating an S_ALLOC_VGPR followed by S_CSELECT_B32/64 instructions for the EXEC and callee. The rest of the call sequence is left undisturbed (i.e. identical to the case where the flags are 0 and we don't use dynamic VGPRs). We achieve this by introducing some new pseudos (SI_CS_CHAIN_TC_Wn_DVGPR) which are expanded in the SILateBranchLowering pass, just like the simpler SI_CS_CHAIN_TC_Wn pseudos. The main reason is so that we don't risk other passes (particularly the PostRA scheduler) introducing instructions between the S_ALLOC_VGPR and the jump. Such instructions might end up using VGPRs that have been deallocated, or the wrong EXEC mask. Once the whole backend treats S_ALLOC_VGPR and changes to EXEC as barriers for instructions that use VGPRs, we could in principle move the expansion earlier (but in the absence of a good reason for that my personal preference is to keep it later in order to make debugging easier). Since the expansion happens after register allocation, we're careful to select constants to immediate operands instead of letting ISel generate S_MOVs which could interfere with register allocation (i.e. make it look like we need more registers than we actually do). For GFX12, S_ALLOC_VGPR only works in wave32 mode, so we bail out during ISel in wave64 mode. However, we can define the pseudos for wave64 too so it's easy to handle if future generations support it. --------- Co-authored-by: Ana Mihajlovic <Ana.Mihajlovic@amd.com> Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>	2025-03-20 08:38:04 +01:00
David Green	53a395fda3	[AArch64][GlobalISel] Legalize more CTPOP vector types. (#131513 ) Similar to other operations, s8, s16 s32 and s64 vector elements are clamped to legal vector sizes, odd number of elements are widened to the next power-2 and s128 is scalarized. This helps legalize cttz as well as ctpop.	2025-03-20 07:21:01 +00:00
David Green	b0876994eb	[AArch64][GlobalISel] Clean up CTLZ vector type legalization. (#131514 ) Similar to other operations, s8, s16 and s32 vector elements are clamped to legal vector sizes, but in this case s64 are scalarized to use the gpr instructions. This allows vector types to split as opposed to scalarizing.	2025-03-19 19:28:36 +00:00
Nicholas Guy	3f4b2f12a1	[llvm] Fix crash when complex deinterleaving operates on an unrolled loop (#129735 ) When attempting to perform complex deinterleaving on an unrolled loop containing a reduction, the complex deinterleaving pass would fail to accommodate the wider types when accumulating the unrolled paths. Instead of trying to alter the incoming IR to fit expectations, the pass should instead decide against processing any reduction that results in a non-complex or non-vector value.	2025-03-19 13:44:02 +00:00
Matt Arsenault	5b6b4fdb4b	DAG: Fix promote of half freeze (#131844 )	2025-03-19 18:30:34 +07:00
Alex Bradbury	6db2594c48	[PreISelIntrinsicLowering] Zext/trunc count parameter as necessary for memset_pattern16 emission (#129239 ) This patch cleans up the handling of the count parameter in general, though was initially motivated by a compiler crash upon a memset.pattern with a narrow count causing a compiler crash due to different types for CreateMul when converting the count to the number of bytes. The logic used to name globals means there is some minor renaming churn in the output to test/Transforms/PreISelIntrinsicLowering/X86/memset-pattern.ll irrelevant to the newly added tests (that would crash before).	2025-03-19 11:16:24 +00:00
Nikita Popov	8f66fb7842	[GlobalMerge] Fix handling of const options For the NewPM, the merge-const option was assigned to an unused option field. Assign it to the correct one. The merge-const-aggressive option was not supported -- and invalid options were silently ignored. Accept it and error on invalid options. For the LegacyPM, the corresponding cl::opt options were ignored when called via opt rather than llc.	2025-03-18 15:06:39 +01:00
David Green	bd1be8a242	[CodeGen][GlobalISel] Add a getVectorIdxWidth and getVectorIdxLLT. (#131526 ) From #106446, this adds a variant of getVectorIdxTy that returns an LLT. Many uses only look at the width, so a getVectorIdxWidth was added as the common base.	2025-03-18 08:31:11 +00:00
Kazu Hirata	62204482c0	[CodeGen] Avoid repeated hash lookups (NFC) (#131722 )	2025-03-18 00:26:59 -07:00
Akshat Oke	6be6400848	[LiveDebugValues][NFC] Remove TargetPassConfig from LDVImpl (#131562 ) TPC is only used to access the option `ShouldEmitDebugEntryValues`.	2025-03-18 11:04:54 +05:30
Jim Lin	00cad3ed22	[SDAG] Handle extract_subvector in isKnownNeverNaN (#131581 ) Propagate nnan across extract_subvector.	2025-03-18 09:37:16 +08:00
Pierre van Houtryve	7dcea28bf9	[AMDGPU] Add identity_combines to RegBankCombiner (#131305 )	2025-03-17 10:11:28 +01:00
Kazu Hirata	05607a3f39	[CodeGen] Avoid repeated hash lookups (NFC) (#131551 )	2025-03-16 23:52:16 -07:00
Hua Tian	b09b9ac108	[llvm][CodeGen] Fix the empty interval issue in Window Scheduler (#129204 ) The interval of newly generated reg in ModuloScheduleExpander is empty. This will cause crash at some corner case. This patch recalculate the live intervals of these regs.	2025-03-17 14:28:47 +08:00
Akshat Oke	687c9d359e	[CodeGen][NPM] Port FEntryInserter to NPM (#129857 )	2025-03-17 10:35:53 +05:30
Kazu Hirata	b648576528	[CodeGen] Avoid repeated hash lookups (NFC) (#131495 )	2025-03-16 11:02:57 -07:00
Fangrui Song	7722d7519c	[MC] evaluateAsRelocatableImpl: remove the Fixup argument Follow-up to d6fbffa23c84e622735b3e880fd800985c1c0072 . This commit updates all call sites and removes the argument from the function.	2025-03-15 16:10:19 -07:00
Kazu Hirata	6a1fd24e9a	[CodeGen] Avoid repeated hash lookups (NFC) (#131422 )	2025-03-15 09:11:59 -07:00
Craig Topper	3b5413c77f	[CodeGen] Use MCRegister in DbgVariableLocation. NFC	2025-03-15 00:46:17 -07:00
Matthias Braun	e6382f2111	SelectionDAG: neg (and x, 1) --> SIGN_EXTEND_INREG x, i1 (#131239 ) The pattern ```LLVM %shl = shl i32 %x, 31 %ashr = ashr i32 %shl, 31 ``` would be combined to `SIGN_EXTEND_INREG %x, ValueType:ch:i1` by SelectionDAG. However InstCombine normalizes this pattern to: ```LLVM %and = and i32 %x, 1 %neg = sub i32 0, %and ``` This adds matching code to DAGCombiner to catch this variant as well.	2025-03-14 10:47:56 -07:00
Philip Reames	bdb4012fe3	[CodeGen] Remove parameter from LiveRangeEdit::canRematerializeAt [NFC] Only one caller cares about the true case of this parameter, so move the check to that single caller. Note that RegisterCoalescer seems like it should care, but it already duplicates the check several lines above.	2025-03-14 09:12:07 -07:00

1 2 3 4 5 ...

37471 Commits