61091 Commits

Author SHA1 Message Date
AZero13
733c1aded1
[ARM] Replace ABS and tABS machine nodes with custom lowering (#156717)
Just do a custom lowering instead.

Also copy paste the cmov-neg fold to prevent regressions in nabs.
2025-09-19 19:43:36 +01:00
Stanislav Mekhanoshin
8fcb712167
[AMDGPU] gfx1250 runlines for global-atomicrmw-fadd.ll. NFC (#159817) 2025-09-19 10:58:41 -07:00
Sam Clegg
cac54a8ad0
[WebAssembly] Require tags for Wasm EH and Wasm SJLJ to be defined externally (#159143)
Rather then defining these tags in each object file that requires them
we can can declare them as undefined and require that they defined
externally in, for example, compiler-rt or libcxxabi.
2025-09-19 10:11:15 -07:00
Akash Dutta
c256966fe2
[AMDGPU]: Unpack packed instructions overlapped by MFMAs post-RA scheduling (#157968)
This is a cleaned up version of PR #151704. These optimizations are now
performed post-RA scheduling.
2025-09-19 09:41:02 -07:00
Craig Topper
6119d1f115
[RISCV] Re-work how VWADD_W_VL and similar _W_VL nodes are handled in combineOp_VLToVWOp_VL. (#159205)
These instructions have one already narrow operand. Previously, we
pretended like this operand was a supported extension.

This could cause problems when we called getOrCreateExtendedOp on this
narrow operand when creating the the VWADD_VL. If the narrow operand
happened to be an extend of the opposite type, we would peek through it
and then rebuild it with the wrong extension type. So (vwadd_w_vl (i32
(sext X)), (i16 (zext Y))) would become (vwadd_vl (i16 (sext X)), (i16
(sext Y))).

To prevent this, we ignore the operand instead and pass std::nullopt for
SupportsExt to getOrCreateExtendedOp so it won't peek through any
extends on the narrow source.

Fixes #159152.
2025-09-19 09:19:57 -07:00
RolandF77
1eb575dcae
[PowerPC] Fix vector extend result types in BUILD_VECTOR lowering (#159398)
The result type of the vector extend intrinsics generated by the
BUILD_VECTOR lowering code should match how they are actually defined.
Currently the result type is defaulting to the operand type there. This
can conflict with calls to the same intrinsic from other paths.
2025-09-19 10:43:22 -04:00
Jeffrey Byrnes
ac8f3cdcf3 [AMDGPU] Precommit test for memory intrinics CGP handling
Change-Id: Id229f849b1d8552bbe59d6e18114042ef1614fad
2025-09-19 07:42:26 -07:00
zhijian lin
be6c4d933d
[PowerPC] using milicode call for strlen instead of lib call (#153600)
AIX has "millicode" routines, which are functions loaded at boot time
into fixed addresses in kernel memory. This allows them to be customized
for the processor. The __strlen routine is a millicode implementation;
we use millicode for the strlen function instead of a library call to
improve performance.
2025-09-19 10:02:21 -04:00
Mikhail Gudim
562146499c
[CodeGen][NewPM] Port ReachingDefAnalysis to new pass manager. (#159572)
In this commit:
  (1) Added new pass manager support for `ReachingDefAnalysis`.
  (2) Added printer pass.
  (3) Make old pass manager use `ReachingDefInfoWrapperPass`
2025-09-19 09:38:34 -04:00
Simon Pilgrim
188c7ed171
[X86] Add test coverage for #159670 (#159767) 2025-09-19 13:09:32 +00:00
Paul Walker
b7e4edca3d
[LLVM][CodeGen] Update PPCFastISel::SelectRet for ConstantInt based vectors. (#159331)
The current implementation assumes ConstantInt return values are scalar,
which is not true when use-constant-int-for-fixed-length-splat is
enabled.
2025-09-19 13:15:57 +01:00
Mariusz Sikora
eed99d5008
[AMDGPU] Fix the magic number RegisterClass for SReg_32 in test (#159761) 2025-09-19 14:14:33 +02:00
Hongyu Chen
fba55c89c3
[X86] Fold X * 1 + Z --> X + Z for VPMADD52L (#158516)
This patch implements the fold `lo(X * 1) + Z --> lo(X) + Z --> X iff X
== lo(X)`.
2025-09-19 19:35:05 +08:00
AZero13
a05e8d506b
[X86] Allow all legal integers to optimize smin with 0 (#151893)
It makes no sense why smin has to be limited to 32 and 64 bits.

hasAndNot only exists for 32 and 64 bits, so this does not affect smax.
2025-09-19 11:08:06 +00:00
Fabian Ritter
d5607694e1
[AMDGPU][SDAG] DAGCombine PTRADD -> disjoint OR (#146075)
If we can't fold a PTRADD's offset into its users, lowering them to
disjoint ORs is preferable: Often, a 32-bit OR instruction suffices
where we'd otherwise use a pair of 32-bit additions with carry.

This needs to be a DAGCombine (and not a selection rule) because its
main purpose is to enable subsequent DAGCombines for bitwise operations.
We don't want to just turn PTRADDs into disjoint ORs whenever that's
sound because this transform loses the information that the operation
implements pointer arithmetic, which AMDGPU for instance needs when
folding constant offsets.

For SWDEV-516125.
2025-09-19 11:58:41 +02:00
UmeshKalappa
b59d410202
RISC-V: builtins support for MIPS RV64 P8700 execution control .
the following changes are made 

a)Typo Fix (with previous PRhttps://github.com/llvm/llvm-project/pull/155747)
b)builtins support  for MIPS P8700 execution control instructions .
c)Testcase
2025-09-19 15:10:28 +05:30
Fabian Ritter
a2dcc88f39
[AMDGPU][SDAG] Handle ISD::PTRADD in various special cases (#145330)
There are more places in SIISelLowering.cpp and AMDGPUISelDAGToDAG.cpp
that check for ISD::ADD in a pointer context, but as far as I can tell
those are only relevant for 32-bit pointer arithmetic (like frame
indices/scratch addresses and LDS), for which we don't enable PTRADD
generation yet.

For SWDEV-516125.
2025-09-19 10:19:38 +02:00
Jim Lin
e747223c03
[RISCV] Implement MC support for Zvfofp8min extension (#157014)
This patch adds MC support for Zvfofp8min
https://github.com/aswaterman/riscv-misc/blob/main/isa/zvfofp8min.adoc.
2025-09-19 07:49:31 +00:00
Fabian Ritter
adfa6a4c14
[AMDGPU][SDAG] Test ISD::PTRADD handling in various special cases (#145329)
Pre-committing tests to show improvements in a follow-up PR.
2025-09-19 09:43:30 +02:00
David Green
ebe7587256 [AArch64] Add some tests for bitcast vector loads and scalarizing loaded vectors. NFC 2025-09-19 07:49:22 +01:00
Jianjian Guan
332eb5f693
[RISCV][GISel] Support select vx, vf form rvv intrinsics (#157398)
For vx form, we legalize it with widen scalar. And for vf form, we select the right register bank.
2025-09-19 14:30:48 +08:00
ZhaoQi
680c657a4f
[LoongArch] Simplily fix extractelement on LA32 (#159564) 2025-09-19 14:14:55 +08:00
Luke Lau
7a77127c0f
[RISCV] Ignore debug instructions in RISCVVLOptimizer (#159616)
Don't put them onto the worklist, since they'll crash when we try to
check their opcode.

Fixes #159422
2025-09-19 12:22:44 +08:00
ZhaoQi
1ad5d63e5e
[LoongArch] Add generation support for [x]vnori.b (#158772) 2025-09-19 09:34:11 +08:00
Matt Arsenault
116ca9522e
Greedy: Take copy hints involving subregisters (#159570)
Previously this would only accept full copy hints. This relaxes
this to accept some subregister copies. Specifically, this now
accepts:
  - Copies to/from physical registers if there is a compatible
    super register
  - Subreg-to-subreg copies

This has the potential to repeatedly add the same hint to the
hint vector, but not sure if that's a real problem.
2025-09-19 09:37:36 +09:00
Matt Arsenault
33e8e5a846
AMDGPU: Add more mfma loop test cases (#159492)
Test cases where the exit uses must be VGPRs,
and don't happen to be a store that could use AGPRs.
2025-09-19 09:36:46 +09:00
Craig Topper
0c1ab02e46
[RISCV] Use bseti 31 for (or X, -2147483648) when upper 32 bits aren't used. (#159678)
If the original type was i32, type legalization will sign extend
the constant. This prevents it from having a single bit set or clear
so other patterns can't match. If the upper bits aren't used, we
can ignore the sign extension.

Similar for bclri and binvi.
2025-09-18 17:33:08 -07:00
Sam Elliott
dda7ce6624
[RISCV] Move Xqci Select-likes to use riscv_selectcc (#153147)
The original patterns for the Xqci select-like instructions used
`select`, and marked that ISD node as legal. This is not the usual way
that `select` is dealt with in the RISC-V backend.

Usually on RISC-V, we expand `select` to `riscv_select_cc` which holds
references to the operands of the comparison and the possible values
depending on the comparison. In retrospect, this is a much better fit
for our instructions, as most of them correspond to specific condition
codes, rather than more generic `select` with a truthy/falsey value.

This PR moves the Xqci select-like patterns to use `riscv_select_cc`
nodes. This applies to the Xqcicm, Xqcics and Xqcicli instruction
patterns.

In order to match the existing codegen, minor additions had to be made
to `translateSetCCForBranch` to ensure that comparisons against specific
immediate values are left in a form that can be matched more closely by
the instructions. This prevents having to insert additional `li`
instructions and use the register forms.

There are a few slight regressions:
- There are sometimes more `mv` instructions than entirely necessary. I
believe these would not be seen with larger examples where the register
allocator has more leeway.
- In some tests where just one of the three extensions is enabled,
codegen falls back to using a branch over a move. With all three
extensions enabled (the configuration we most care about), these are not
seen.
- The generated patterns are very similar to each other - they have
similar complexity (7 or 8) and there are still overlaps. Sometimes the
choice between two instructions can be affected by the order of the
patterns in the tablegen file.

One other change is that Xqcicm instructions are prioritised over Xqcics
instructions where they have identical patterns. This is done because
one of the the Xqcicm instructions is compressible (`qc.mveqi`), while
none of the Xqcics instructions are.
2025-09-19 00:16:44 +00:00
Stanislav Mekhanoshin
6ac0abf8c4
[AMDGPU] gfx1251 VOP3 dpp support (#159654) 2025-09-18 16:18:09 -07:00
Stanislav Mekhanoshin
8cfbace7b2
[AMDGPU] gfx1251 VOP2 dpp support (#159641) 2025-09-18 15:38:29 -07:00
Stanislav Mekhanoshin
e3c7b7f806
[AMDGPU] gfx1251 VOP1 dpp support (#159637) 2025-09-18 13:42:06 -07:00
Shaoce SUN
c3383d74a7
[RISCV][GlobalIsel] Remove redundant sext.w for ADDIW (#159597)
This is the minimal case generated by clang at `-O0`; I'm not sure if
writing the test this way is appropriate.
2025-09-18 17:29:54 +00:00
Alexey Karyakin
bbcb5f421d
Shuffle patterns to vdeal + vpack (#159464)
Lowering shuffle patterns to vdeal + vpack caused an assertion because
the vdeal parameter value is negative but an unsigned one was expected.
2025-09-18 11:55:46 -05:00
Simon Pilgrim
6e47bff24d
[AMDGPU] callee-special-input-vgprs.ll / callee-special-input-vgprs-packed.ll - regenerate test coverage (#159587) 2025-09-18 15:19:48 +00:00
Fabian Ritter
01b4b2a5b8
[AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (#143881)
This patch mirrors similar patterns for ISD::ADD. The main difference is
that ISD::ADD is commutative, so that a pattern definition for, e.g.,
(add (mul x, y), z), automatically also handles (add z, (mul x, y)).
ISD::PTRADD is not commutative, so we would need to handle these cases
explicitly. This patch only implements (ptradd z, (op x, y)) patterns,
where the nested operation (shift or multiply) is the offset of the
ptradd (i.e., the right operand), since base pointers that are the
result of a shift or multiply seem less likely.

For SWDEV-516125.
2025-09-18 15:01:07 +02:00
Petar Avramovic
2ec7959b96
[AMDGPU][SIInsertWaitcnts] Track SCC. Insert KM_CNT waits for SCC writes. (#157843)
Add new event SCC_WRITE for s_barrier_signal_isfirst and s_barrier_leave,
instructions that write to SCC, counter is KM_CNT.
Also start tracking SCC for reads and writes.
s_barrier_wait on the same barrier guarantees that the SCC write from
s_barrier_signal_isfirst has landed, no need to insert s_wait_kmcnt.
2025-09-18 14:41:01 +02:00
Simon Pilgrim
85527609a0
[AMDGPU] kernel-argument-dag-lowering.ll - regenerate test coverage (#159526) 2025-09-18 09:34:38 +00:00
Simon Pilgrim
573b3775e4
[X86] Add test coverage for #158649 (#159524)
Demonstrates the failure to keep avx512 mask predicate bit manipulation
patterns (based off the BMI1/BMI2/TBM style patterns) on the predicate
registers - unless the pattern is particularly complex the cost of
transferring to/from gpr outweighs any gains from better scalar
instructions

I've been rather random with the mask types for the tests, I can adjust
later on if there are particular cases of interest
2025-09-18 09:33:10 +00:00
David Green
d76d0a5139 [AArch64] Regenerate and update a number of check lines. NFC 2025-09-18 09:54:47 +01:00
yingopq
ddf0f6fe91
Revert "[Mips] Fix atomic min/max generate mips4 instructions when compiling for mips2" (#159495)
Reverts llvm/llvm-project#149983
2025-09-18 15:07:15 +08:00
Jim Lin
8548fa00f1
[RISCV] Match fmaxnum and fminnum to reduction ops. (#159244)
This patch tries to match fmaxnum and fminnum to vector reductions.
2025-09-18 11:11:52 +08:00
Boyao Wang
27f8f9e1f1
[RISCV][CodeGen] Add CodeGen support of Zibi experimental extension (#146858)
This adds the CodeGen support of Zibi v0.1 experimental extension, which
depends on #127463.
2025-09-18 11:03:48 +08:00
woruyu
1a172b9924
[RISCV][GISel] Lower G_SSUBE (#157855)
### Summary
Try to implemente Lower G_SSUBE in LegalizerHelper::lower
2025-09-18 10:08:56 +08:00
hev
7ca448e479
[LoongArch] Fix MergeBaseOffset for constant pool index operand (#159336)
Fixes #159200
2025-09-18 10:06:33 +08:00
Craig Topper
38f2a1cb9b
[RISCV][GISel] Test legalizing s64 G_UADDE on RV32. And s128 on RV64. NFC (#159412) 2025-09-17 17:23:28 -07:00
Stanislav Mekhanoshin
221f8eef9d
[AMDGPU] Add gfx1251 runlines to cooperative atomcis tests. NFC (#159437) 2025-09-17 14:08:05 -07:00
Björn Pettersson
1c4c7bd808
[SelectionDAG] Deal with POISON for INSERT_VECTOR_ELT/INSERT_SUBVECTOR (#143102)
As reported in https://github.com/llvm/llvm-project/issues/141034
SelectionDAG::getNode had some unexpected
behaviors when trying to create vectors with UNDEF elements. Since
we treat both UNDEF and POISON as undefined (when using isUndef())
we can't just fold away INSERT_VECTOR_ELT/INSERT_SUBVECTOR based on
isUndef(), as that could make the resulting vector more poisonous.

Same kind of bug existed in DAGCombiner::visitINSERT_SUBVECTOR.

Here are some examples:

This fold was done even if vec[idx] was POISON:
  INSERT_VECTOR_ELT vec, UNDEF, idx -> vec

This fold was done even if any of vec[idx..idx+size] was POISON:
  INSERT_SUBVECTOR vec, UNDEF, idx -> vec

This fold was done even if the elements not extracted from vec could
be POISON:
  sub = EXTRACT_SUBVECTOR vec, idx
  INSERT_SUBVECTOR UNDEF, sub, idx -> vec

With this patch we avoid such folds unless we can prove that the
result isn't more poisonous when eliminating the insert.

Fixes https://github.com/llvm/llvm-project/issues/141034
2025-09-17 21:04:00 +00:00
Stanislav Mekhanoshin
e556dc0b23
[AMDGPU] Add gfx1251 subtarget (#159430) 2025-09-17 13:02:02 -07:00
Ying Wang
4bac9d4911
[RISCV] Add isel for bitcasting between bfloat and half types (#158828)
There is no RISCV isel for bitcast between f16 and bf16 which will
trigger "cannot select" fatal error.

Co-authored-by: Ying Wang <wy446777@alibaba-inc.com>
2025-09-17 12:10:47 -07:00
Vladislav Dzhidzhoev
432b58915a
[DebugInfo][DwarfDebug] Separate creation and population of abstract subprogram DIEs (#159104)
With this change, construction of abstract subprogram DIEs is split in
two stages/functions:
creation of DIE (in DwarfCompileUnit::getOrCreateAbstractSubprogramDIE)
and its population with children (in
DwarfCompileUnit::constructAbstractSubprogramScopeDIE).
With that, abstract subprograms can be created/referenced from
DwarfDebug::beginModule, which should solve the issue with static local
variables DIE creation of inlined functons with optimized-out
definitions. It fixes https://github.com/llvm/llvm-project/issues/29985.

LexicalScopes class now stores mapping from DISubprograms to their
corresponding llvm::Function's. It is supposed to be built before
processing of each function (so, now LexicalScopes class has a method
for "module initialization" alongside the method for "function
initialization"). It is used by DwarfCompileUnit to determine whether a
DISubprogram needs an abstract DIE before DwarfDebug::beginFunction is
invoked.

DwarfCompileUnit::getOrCreateSubprogramDIE method is added, which can
create an abstract or a concrete DIE for a subprogram. It accepts
llvm::Function* argument to determine whether a concrete DIE must be
created.

This is a temporary fix for
https://github.com/llvm/llvm-project/issues/29985. Ideally, it will be
fixed by moving global variables and types emission to
DwarfDebug::endModule (https://reviews.llvm.org/D144007,
https://reviews.llvm.org/D144005).

Some code proposed by Ellis Hoag <ellis.sparky.hoag@gmail.com> in
https://github.com/llvm/llvm-project/pull/90523 was taken for this
commit.
2025-09-17 20:06:49 +02:00