llvm-project

Author	SHA1	Message	Date
AZero13	733c1aded1	[ARM] Replace ABS and tABS machine nodes with custom lowering (#156717 ) Just do a custom lowering instead. Also copy paste the cmov-neg fold to prevent regressions in nabs.	2025-09-19 19:43:36 +01:00
Stanislav Mekhanoshin	8fcb712167	[AMDGPU] gfx1250 runlines for global-atomicrmw-fadd.ll. NFC (#159817 )	2025-09-19 10:58:41 -07:00
Sam Clegg	cac54a8ad0	[WebAssembly] Require tags for Wasm EH and Wasm SJLJ to be defined externally (#159143 ) Rather then defining these tags in each object file that requires them we can can declare them as undefined and require that they defined externally in, for example, compiler-rt or libcxxabi.	2025-09-19 10:11:15 -07:00
Akash Dutta	c256966fe2	[AMDGPU]: Unpack packed instructions overlapped by MFMAs post-RA scheduling (#157968 ) This is a cleaned up version of PR #151704. These optimizations are now performed post-RA scheduling.	2025-09-19 09:41:02 -07:00
Craig Topper	6119d1f115	[RISCV] Re-work how VWADD_W_VL and similar _W_VL nodes are handled in combineOp_VLToVWOp_VL. (#159205 ) These instructions have one already narrow operand. Previously, we pretended like this operand was a supported extension. This could cause problems when we called getOrCreateExtendedOp on this narrow operand when creating the the VWADD_VL. If the narrow operand happened to be an extend of the opposite type, we would peek through it and then rebuild it with the wrong extension type. So (vwadd_w_vl (i32 (sext X)), (i16 (zext Y))) would become (vwadd_vl (i16 (sext X)), (i16 (sext Y))). To prevent this, we ignore the operand instead and pass std::nullopt for SupportsExt to getOrCreateExtendedOp so it won't peek through any extends on the narrow source. Fixes #159152.	2025-09-19 09:19:57 -07:00
RolandF77	1eb575dcae	[PowerPC] Fix vector extend result types in BUILD_VECTOR lowering (#159398 ) The result type of the vector extend intrinsics generated by the BUILD_VECTOR lowering code should match how they are actually defined. Currently the result type is defaulting to the operand type there. This can conflict with calls to the same intrinsic from other paths.	2025-09-19 10:43:22 -04:00
Jeffrey Byrnes	ac8f3cdcf3	[AMDGPU] Precommit test for memory intrinics CGP handling Change-Id: Id229f849b1d8552bbe59d6e18114042ef1614fad	2025-09-19 07:42:26 -07:00
zhijian lin	be6c4d933d	[PowerPC] using milicode call for strlen instead of lib call (#153600 ) AIX has "millicode" routines, which are functions loaded at boot time into fixed addresses in kernel memory. This allows them to be customized for the processor. The __strlen routine is a millicode implementation; we use millicode for the strlen function instead of a library call to improve performance.	2025-09-19 10:02:21 -04:00
Mikhail Gudim	562146499c	[CodeGen][NewPM] Port `ReachingDefAnalysis` to new pass manager. (#159572 ) In this commit: (1) Added new pass manager support for `ReachingDefAnalysis`. (2) Added printer pass. (3) Make old pass manager use `ReachingDefInfoWrapperPass`	2025-09-19 09:38:34 -04:00
Simon Pilgrim	188c7ed171	[X86] Add test coverage for #159670 (#159767 )	2025-09-19 13:09:32 +00:00
Paul Walker	b7e4edca3d	[LLVM][CodeGen] Update PPCFastISel::SelectRet for ConstantInt based vectors. (#159331 ) The current implementation assumes ConstantInt return values are scalar, which is not true when use-constant-int-for-fixed-length-splat is enabled.	2025-09-19 13:15:57 +01:00
Mariusz Sikora	eed99d5008	[AMDGPU] Fix the magic number RegisterClass for SReg_32 in test (#159761 )	2025-09-19 14:14:33 +02:00
Hongyu Chen	fba55c89c3	[X86] Fold X * 1 + Z --> X + Z for VPMADD52L (#158516 ) This patch implements the fold `lo(X * 1) + Z --> lo(X) + Z --> X iff X == lo(X)`.	2025-09-19 19:35:05 +08:00
AZero13	a05e8d506b	[X86] Allow all legal integers to optimize smin with 0 (#151893 ) It makes no sense why smin has to be limited to 32 and 64 bits. hasAndNot only exists for 32 and 64 bits, so this does not affect smax.	2025-09-19 11:08:06 +00:00
Fabian Ritter	d5607694e1	[AMDGPU][SDAG] DAGCombine PTRADD -> disjoint OR (#146075 ) If we can't fold a PTRADD's offset into its users, lowering them to disjoint ORs is preferable: Often, a 32-bit OR instruction suffices where we'd otherwise use a pair of 32-bit additions with carry. This needs to be a DAGCombine (and not a selection rule) because its main purpose is to enable subsequent DAGCombines for bitwise operations. We don't want to just turn PTRADDs into disjoint ORs whenever that's sound because this transform loses the information that the operation implements pointer arithmetic, which AMDGPU for instance needs when folding constant offsets. For SWDEV-516125.	2025-09-19 11:58:41 +02:00
UmeshKalappa	b59d410202	RISC-V: builtins support for MIPS RV64 P8700 execution control . the following changes are made a)Typo Fix (with previous PRhttps://github.com/llvm/llvm-project/pull/155747) b)builtins support for MIPS P8700 execution control instructions . c)Testcase	2025-09-19 15:10:28 +05:30
Fabian Ritter	a2dcc88f39	[AMDGPU][SDAG] Handle ISD::PTRADD in various special cases (#145330 ) There are more places in SIISelLowering.cpp and AMDGPUISelDAGToDAG.cpp that check for ISD::ADD in a pointer context, but as far as I can tell those are only relevant for 32-bit pointer arithmetic (like frame indices/scratch addresses and LDS), for which we don't enable PTRADD generation yet. For SWDEV-516125.	2025-09-19 10:19:38 +02:00
Jim Lin	e747223c03	[RISCV] Implement MC support for Zvfofp8min extension (#157014 ) This patch adds MC support for Zvfofp8min https://github.com/aswaterman/riscv-misc/blob/main/isa/zvfofp8min.adoc.	2025-09-19 07:49:31 +00:00
Fabian Ritter	adfa6a4c14	[AMDGPU][SDAG] Test ISD::PTRADD handling in various special cases (#145329 ) Pre-committing tests to show improvements in a follow-up PR.	2025-09-19 09:43:30 +02:00
David Green	ebe7587256	[AArch64] Add some tests for bitcast vector loads and scalarizing loaded vectors. NFC	2025-09-19 07:49:22 +01:00
Jianjian Guan	332eb5f693	[RISCV][GISel] Support select vx, vf form rvv intrinsics (#157398 ) For vx form, we legalize it with widen scalar. And for vf form, we select the right register bank.	2025-09-19 14:30:48 +08:00
ZhaoQi	680c657a4f	[LoongArch] Simplily fix extractelement on LA32 (#159564 )	2025-09-19 14:14:55 +08:00
Luke Lau	7a77127c0f	[RISCV] Ignore debug instructions in RISCVVLOptimizer (#159616 ) Don't put them onto the worklist, since they'll crash when we try to check their opcode. Fixes #159422	2025-09-19 12:22:44 +08:00
ZhaoQi	1ad5d63e5e	[LoongArch] Add generation support for `[x]vnori.b` (#158772 )	2025-09-19 09:34:11 +08:00
Matt Arsenault	116ca9522e	Greedy: Take copy hints involving subregisters (#159570 ) Previously this would only accept full copy hints. This relaxes this to accept some subregister copies. Specifically, this now accepts: - Copies to/from physical registers if there is a compatible super register - Subreg-to-subreg copies This has the potential to repeatedly add the same hint to the hint vector, but not sure if that's a real problem.	2025-09-19 09:37:36 +09:00
Matt Arsenault	33e8e5a846	AMDGPU: Add more mfma loop test cases (#159492 ) Test cases where the exit uses must be VGPRs, and don't happen to be a store that could use AGPRs.	2025-09-19 09:36:46 +09:00
Craig Topper	0c1ab02e46	[RISCV] Use bseti 31 for (or X, -2147483648) when upper 32 bits aren't used. (#159678 ) If the original type was i32, type legalization will sign extend the constant. This prevents it from having a single bit set or clear so other patterns can't match. If the upper bits aren't used, we can ignore the sign extension. Similar for bclri and binvi.	2025-09-18 17:33:08 -07:00
Sam Elliott	dda7ce6624	[RISCV] Move Xqci Select-likes to use riscv_selectcc (#153147 ) The original patterns for the Xqci select-like instructions used `select`, and marked that ISD node as legal. This is not the usual way that `select` is dealt with in the RISC-V backend. Usually on RISC-V, we expand `select` to `riscv_select_cc` which holds references to the operands of the comparison and the possible values depending on the comparison. In retrospect, this is a much better fit for our instructions, as most of them correspond to specific condition codes, rather than more generic `select` with a truthy/falsey value. This PR moves the Xqci select-like patterns to use `riscv_select_cc` nodes. This applies to the Xqcicm, Xqcics and Xqcicli instruction patterns. In order to match the existing codegen, minor additions had to be made to `translateSetCCForBranch` to ensure that comparisons against specific immediate values are left in a form that can be matched more closely by the instructions. This prevents having to insert additional `li` instructions and use the register forms. There are a few slight regressions: - There are sometimes more `mv` instructions than entirely necessary. I believe these would not be seen with larger examples where the register allocator has more leeway. - In some tests where just one of the three extensions is enabled, codegen falls back to using a branch over a move. With all three extensions enabled (the configuration we most care about), these are not seen. - The generated patterns are very similar to each other - they have similar complexity (7 or 8) and there are still overlaps. Sometimes the choice between two instructions can be affected by the order of the patterns in the tablegen file. One other change is that Xqcicm instructions are prioritised over Xqcics instructions where they have identical patterns. This is done because one of the the Xqcicm instructions is compressible (`qc.mveqi`), while none of the Xqcics instructions are.	2025-09-19 00:16:44 +00:00
Stanislav Mekhanoshin	6ac0abf8c4	[AMDGPU] gfx1251 VOP3 dpp support (#159654 )	2025-09-18 16:18:09 -07:00
Stanislav Mekhanoshin	8cfbace7b2	[AMDGPU] gfx1251 VOP2 dpp support (#159641 )	2025-09-18 15:38:29 -07:00
Stanislav Mekhanoshin	e3c7b7f806	[AMDGPU] gfx1251 VOP1 dpp support (#159637 )	2025-09-18 13:42:06 -07:00
Shaoce SUN	c3383d74a7	[RISCV][GlobalIsel] Remove redundant sext.w for ADDIW (#159597 ) This is the minimal case generated by clang at `-O0`; I'm not sure if writing the test this way is appropriate.	2025-09-18 17:29:54 +00:00
Alexey Karyakin	bbcb5f421d	Shuffle patterns to vdeal + vpack (#159464 ) Lowering shuffle patterns to vdeal + vpack caused an assertion because the vdeal parameter value is negative but an unsigned one was expected.	2025-09-18 11:55:46 -05:00
Simon Pilgrim	6e47bff24d	[AMDGPU] callee-special-input-vgprs.ll / callee-special-input-vgprs-packed.ll - regenerate test coverage (#159587 )	2025-09-18 15:19:48 +00:00
Fabian Ritter	01b4b2a5b8	[AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (#143881 ) This patch mirrors similar patterns for ISD::ADD. The main difference is that ISD::ADD is commutative, so that a pattern definition for, e.g., (add (mul x, y), z), automatically also handles (add z, (mul x, y)). ISD::PTRADD is not commutative, so we would need to handle these cases explicitly. This patch only implements (ptradd z, (op x, y)) patterns, where the nested operation (shift or multiply) is the offset of the ptradd (i.e., the right operand), since base pointers that are the result of a shift or multiply seem less likely. For SWDEV-516125.	2025-09-18 15:01:07 +02:00
Petar Avramovic	2ec7959b96	[AMDGPU][SIInsertWaitcnts] Track SCC. Insert KM_CNT waits for SCC writes. (#157843 ) Add new event SCC_WRITE for s_barrier_signal_isfirst and s_barrier_leave, instructions that write to SCC, counter is KM_CNT. Also start tracking SCC for reads and writes. s_barrier_wait on the same barrier guarantees that the SCC write from s_barrier_signal_isfirst has landed, no need to insert s_wait_kmcnt.	2025-09-18 14:41:01 +02:00
Simon Pilgrim	85527609a0	[AMDGPU] kernel-argument-dag-lowering.ll - regenerate test coverage (#159526 )	2025-09-18 09:34:38 +00:00
Simon Pilgrim	573b3775e4	[X86] Add test coverage for #158649 (#159524 ) Demonstrates the failure to keep avx512 mask predicate bit manipulation patterns (based off the BMI1/BMI2/TBM style patterns) on the predicate registers - unless the pattern is particularly complex the cost of transferring to/from gpr outweighs any gains from better scalar instructions I've been rather random with the mask types for the tests, I can adjust later on if there are particular cases of interest	2025-09-18 09:33:10 +00:00
David Green	d76d0a5139	[AArch64] Regenerate and update a number of check lines. NFC	2025-09-18 09:54:47 +01:00
yingopq	ddf0f6fe91	Revert "[Mips] Fix atomic min/max generate mips4 instructions when compiling for mips2" (#159495 ) Reverts llvm/llvm-project#149983	2025-09-18 15:07:15 +08:00
Jim Lin	8548fa00f1	[RISCV] Match fmaxnum and fminnum to reduction ops. (#159244 ) This patch tries to match fmaxnum and fminnum to vector reductions.	2025-09-18 11:11:52 +08:00
Boyao Wang	27f8f9e1f1	[RISCV][CodeGen] Add CodeGen support of Zibi experimental extension (#146858 ) This adds the CodeGen support of Zibi v0.1 experimental extension, which depends on #127463.	2025-09-18 11:03:48 +08:00
woruyu	1a172b9924	[RISCV][GISel] Lower G_SSUBE (#157855 ) ### Summary Try to implemente Lower G_SSUBE in LegalizerHelper::lower	2025-09-18 10:08:56 +08:00
hev	7ca448e479	[LoongArch] Fix MergeBaseOffset for constant pool index operand (#159336 ) Fixes #159200	2025-09-18 10:06:33 +08:00
Craig Topper	38f2a1cb9b	[RISCV][GISel] Test legalizing s64 G_UADDE on RV32. And s128 on RV64. NFC (#159412 )	2025-09-17 17:23:28 -07:00
Stanislav Mekhanoshin	221f8eef9d	[AMDGPU] Add gfx1251 runlines to cooperative atomcis tests. NFC (#159437 )	2025-09-17 14:08:05 -07:00
Björn Pettersson	1c4c7bd808	[SelectionDAG] Deal with POISON for INSERT_VECTOR_ELT/INSERT_SUBVECTOR (#143102 ) As reported in https://github.com/llvm/llvm-project/issues/141034 SelectionDAG::getNode had some unexpected behaviors when trying to create vectors with UNDEF elements. Since we treat both UNDEF and POISON as undefined (when using isUndef()) we can't just fold away INSERT_VECTOR_ELT/INSERT_SUBVECTOR based on isUndef(), as that could make the resulting vector more poisonous. Same kind of bug existed in DAGCombiner::visitINSERT_SUBVECTOR. Here are some examples: This fold was done even if vec[idx] was POISON: INSERT_VECTOR_ELT vec, UNDEF, idx -> vec This fold was done even if any of vec[idx..idx+size] was POISON: INSERT_SUBVECTOR vec, UNDEF, idx -> vec This fold was done even if the elements not extracted from vec could be POISON: sub = EXTRACT_SUBVECTOR vec, idx INSERT_SUBVECTOR UNDEF, sub, idx -> vec With this patch we avoid such folds unless we can prove that the result isn't more poisonous when eliminating the insert. Fixes https://github.com/llvm/llvm-project/issues/141034	2025-09-17 21:04:00 +00:00
Stanislav Mekhanoshin	e556dc0b23	[AMDGPU] Add gfx1251 subtarget (#159430 )	2025-09-17 13:02:02 -07:00
Ying Wang	4bac9d4911	[RISCV] Add isel for bitcasting between bfloat and half types (#158828 ) There is no RISCV isel for bitcast between f16 and bf16 which will trigger "cannot select" fatal error. Co-authored-by: Ying Wang <wy446777@alibaba-inc.com>	2025-09-17 12:10:47 -07:00
Vladislav Dzhidzhoev	432b58915a	[DebugInfo][DwarfDebug] Separate creation and population of abstract subprogram DIEs (#159104 ) With this change, construction of abstract subprogram DIEs is split in two stages/functions: creation of DIE (in DwarfCompileUnit::getOrCreateAbstractSubprogramDIE) and its population with children (in DwarfCompileUnit::constructAbstractSubprogramScopeDIE). With that, abstract subprograms can be created/referenced from DwarfDebug::beginModule, which should solve the issue with static local variables DIE creation of inlined functons with optimized-out definitions. It fixes https://github.com/llvm/llvm-project/issues/29985. LexicalScopes class now stores mapping from DISubprograms to their corresponding llvm::Function's. It is supposed to be built before processing of each function (so, now LexicalScopes class has a method for "module initialization" alongside the method for "function initialization"). It is used by DwarfCompileUnit to determine whether a DISubprogram needs an abstract DIE before DwarfDebug::beginFunction is invoked. DwarfCompileUnit::getOrCreateSubprogramDIE method is added, which can create an abstract or a concrete DIE for a subprogram. It accepts llvm::Function* argument to determine whether a concrete DIE must be created. This is a temporary fix for https://github.com/llvm/llvm-project/issues/29985. Ideally, it will be fixed by moving global variables and types emission to DwarfDebug::endModule (https://reviews.llvm.org/D144007, https://reviews.llvm.org/D144005). Some code proposed by Ellis Hoag <ellis.sparky.hoag@gmail.com> in https://github.com/llvm/llvm-project/pull/90523 was taken for this commit.	2025-09-17 20:06:49 +02:00

1 2 3 4 5 ...

61091 Commits