38377 Commits

Author SHA1 Message Date
Sam Clegg
cac54a8ad0
[WebAssembly] Require tags for Wasm EH and Wasm SJLJ to be defined externally (#159143)
Rather then defining these tags in each object file that requires them
we can can declare them as undefined and require that they defined
externally in, for example, compiler-rt or libcxxabi.
2025-09-19 10:11:15 -07:00
zhijian lin
be6c4d933d
[PowerPC] using milicode call for strlen instead of lib call (#153600)
AIX has "millicode" routines, which are functions loaded at boot time
into fixed addresses in kernel memory. This allows them to be customized
for the processor. The __strlen routine is a millicode implementation;
we use millicode for the strlen function instead of a library call to
improve performance.
2025-09-19 10:02:21 -04:00
Mikhail Gudim
562146499c
[CodeGen][NewPM] Port ReachingDefAnalysis to new pass manager. (#159572)
In this commit:
  (1) Added new pass manager support for `ReachingDefAnalysis`.
  (2) Added printer pass.
  (3) Make old pass manager use `ReachingDefInfoWrapperPass`
2025-09-19 09:38:34 -04:00
Matt Arsenault
6b54c92be0
CodeGen: Add RegisterClass by HwMode (#158269)
This is a generalization of the LookupPtrRegClass mechanism.
AMDGPU has several use cases for swapping the register class of
instruction operands based on the subtarget, but none of them
really fit into the box of being pointer-like.

The current system requires manual management of an arbitrary integer
ID. For the AMDGPU use case, this would end up being around 40 new
entries to manage.

This just introduces the base infrastructure. I have ports of all
the target specific usage of PointerLikeRegClass ready.
2025-09-19 20:08:51 +09:00
Fabian Ritter
d5607694e1
[AMDGPU][SDAG] DAGCombine PTRADD -> disjoint OR (#146075)
If we can't fold a PTRADD's offset into its users, lowering them to
disjoint ORs is preferable: Often, a 32-bit OR instruction suffices
where we'd otherwise use a pair of 32-bit additions with carry.

This needs to be a DAGCombine (and not a selection rule) because its
main purpose is to enable subsequent DAGCombines for bitwise operations.
We don't want to just turn PTRADDs into disjoint ORs whenever that's
sound because this transform loses the information that the operation
implements pointer arithmetic, which AMDGPU for instance needs when
folding constant offsets.

For SWDEV-516125.
2025-09-19 11:58:41 +02:00
Fabian Ritter
771c94c8db
[SDAG][AMDGPU] Allow opting in to OOB-generating PTRADD transforms (#146074)
This PR adds a TargetLowering hook, canTransformPtrArithOutOfBounds,
that targets can use to allow transformations to introduce out-of-bounds
pointer arithmetic. It also moves two such transformations from the
AMDGPU-specific DAG combines to the generic DAGCombiner.

This is motivated by target features like AArch64's checked pointer
arithmetic, CPA, which does not tolerate the introduction of
out-of-bounds pointer arithmetic.
2025-09-19 11:07:59 +02:00
Fabian Ritter
a2dcc88f39
[AMDGPU][SDAG] Handle ISD::PTRADD in various special cases (#145330)
There are more places in SIISelLowering.cpp and AMDGPUISelDAGToDAG.cpp
that check for ISD::ADD in a pointer context, but as far as I can tell
those are only relevant for 32-bit pointer arithmetic (like frame
indices/scratch addresses and LDS), for which we don't enable PTRADD
generation yet.

For SWDEV-516125.
2025-09-19 10:19:38 +02:00
Matt Arsenault
116ca9522e
Greedy: Take copy hints involving subregisters (#159570)
Previously this would only accept full copy hints. This relaxes
this to accept some subregister copies. Specifically, this now
accepts:
  - Copies to/from physical registers if there is a compatible
    super register
  - Subreg-to-subreg copies

This has the potential to repeatedly add the same hint to the
hint vector, but not sure if that's a real problem.
2025-09-19 09:37:36 +09:00
Qiu Chaofan
e8311f8ebc
[DebugInfo] Emit skeleton to avoid mismatching inlining flags (#153568)
This actually reverts 418120556398c01550d42500d56e6d328290185b.

The original commit omits unit with all symbols inlined into current
one, which leads to crash when a module using split-dwarf inlined a
function from another module with mismatched split-dwarf-inlining
option. This revert guarantees that DIEs are created in both DWO and the
skeleton sections whenever split-dwarf is active.
2025-09-18 12:46:10 -07:00
Scott Linder
ad68e5d56c
[LiveDebugVariables] Use bundle-aware iterators consistently (#159471)
Most of the pass works in terms of MachineBasicBlock::iterator
(MachineInstrBundleIterator), but here one is constructed from an
arbitrary instruction which may be within a bundle, causing an
assertion.
2025-09-18 10:47:07 -04:00
Abhishek Kaushik
98ebb64a16
[NFC][MIRPrinter] Use std::move to avoid copy (#157832) 2025-09-18 14:40:41 +05:30
woruyu
1a172b9924
[RISCV][GISel] Lower G_SSUBE (#157855)
### Summary
Try to implemente Lower G_SSUBE in LegalizerHelper::lower
2025-09-18 10:08:56 +08:00
hev
7ca448e479
[LoongArch] Fix MergeBaseOffset for constant pool index operand (#159336)
Fixes #159200
2025-09-18 10:06:33 +08:00
Björn Pettersson
1c4c7bd808
[SelectionDAG] Deal with POISON for INSERT_VECTOR_ELT/INSERT_SUBVECTOR (#143102)
As reported in https://github.com/llvm/llvm-project/issues/141034
SelectionDAG::getNode had some unexpected
behaviors when trying to create vectors with UNDEF elements. Since
we treat both UNDEF and POISON as undefined (when using isUndef())
we can't just fold away INSERT_VECTOR_ELT/INSERT_SUBVECTOR based on
isUndef(), as that could make the resulting vector more poisonous.

Same kind of bug existed in DAGCombiner::visitINSERT_SUBVECTOR.

Here are some examples:

This fold was done even if vec[idx] was POISON:
  INSERT_VECTOR_ELT vec, UNDEF, idx -> vec

This fold was done even if any of vec[idx..idx+size] was POISON:
  INSERT_SUBVECTOR vec, UNDEF, idx -> vec

This fold was done even if the elements not extracted from vec could
be POISON:
  sub = EXTRACT_SUBVECTOR vec, idx
  INSERT_SUBVECTOR UNDEF, sub, idx -> vec

With this patch we avoid such folds unless we can prove that the
result isn't more poisonous when eliminating the insert.

Fixes https://github.com/llvm/llvm-project/issues/141034
2025-09-17 21:04:00 +00:00
Ramkumar Ramachandra
7fb3a91418
[PatternMatch] Introduce match functor (NFC) (#159386)
A common idiom is the usage of the PatternMatch match function within a
functional algorithm like all_of. Introduce a match functor to shorten
this idiom.

Co-authored-by: Luke Lau <luke@igalia.com>
2025-09-17 21:04:33 +01:00
Vladislav Dzhidzhoev
432b58915a
[DebugInfo][DwarfDebug] Separate creation and population of abstract subprogram DIEs (#159104)
With this change, construction of abstract subprogram DIEs is split in
two stages/functions:
creation of DIE (in DwarfCompileUnit::getOrCreateAbstractSubprogramDIE)
and its population with children (in
DwarfCompileUnit::constructAbstractSubprogramScopeDIE).
With that, abstract subprograms can be created/referenced from
DwarfDebug::beginModule, which should solve the issue with static local
variables DIE creation of inlined functons with optimized-out
definitions. It fixes https://github.com/llvm/llvm-project/issues/29985.

LexicalScopes class now stores mapping from DISubprograms to their
corresponding llvm::Function's. It is supposed to be built before
processing of each function (so, now LexicalScopes class has a method
for "module initialization" alongside the method for "function
initialization"). It is used by DwarfCompileUnit to determine whether a
DISubprogram needs an abstract DIE before DwarfDebug::beginFunction is
invoked.

DwarfCompileUnit::getOrCreateSubprogramDIE method is added, which can
create an abstract or a concrete DIE for a subprogram. It accepts
llvm::Function* argument to determine whether a concrete DIE must be
created.

This is a temporary fix for
https://github.com/llvm/llvm-project/issues/29985. Ideally, it will be
fixed by moving global variables and types emission to
DwarfDebug::endModule (https://reviews.llvm.org/D144007,
https://reviews.llvm.org/D144005).

Some code proposed by Ellis Hoag <ellis.sparky.hoag@gmail.com> in
https://github.com/llvm/llvm-project/pull/90523 was taken for this
commit.
2025-09-17 20:06:49 +02:00
Simon Pilgrim
57d67bec6d
[DAG] getNode() - reuse result type instead of calling getValueType again. NFC. (#159381)
We have assertions above confirming VT == N1.getValueType() for INSERT_VECTOR_ELT nodes.
2025-09-17 15:52:09 +00:00
Sander de Smalen
17e008db17
[IR] NFC: Remove 'experimental' from partial.reduce.add intrinsic (#158637)
The partial reduction intrinsics are no longer experimental, because
they've been used in production for a while and are unlikely to change.
2025-09-17 11:44:47 +01:00
Matt Arsenault
1dbb932fd8
GlobalISel: Relax verifier between physreg and typed vreg (#159281)
Accept mismatched register size and type size if the type is legal
for the register class.

For AMDGPU boolean registers have 2 possible interpretations depending
on the use context type. e.g., these are both equally valid:

  %0:_(s1) = COPY $vcc
  %1:_(s64) = COPY $vcc

vcc is a 64-bit register, which can be interpreted as a 1-bit or 64-bit
value depending on the use context. SelectionDAG has never required
exact
match between the register size and the used value type. You can assign
a type with a smaller size to a larger register class. Relax the
verifier
to match.  There are several hacks holding together these copies in
various places, and this is preparation to remove one of them.

The x86 test change is from what I would consider an X86 usage bug. X86
defines an FR32 register class and F16 register class, but the F16
register
class is functionally an alias of F32 with the same members and size.
There's
no need to have the F16 class.
2025-09-17 19:43:50 +09:00
Mingming Liu
8b3c91c4fb
Re-apply "[NFCI][Globals] In GlobalObjects::setSectionPrefix, do conditional update if existing prefix is not equivalent to the new one. Returns whether prefix changed." (#159161)
This is a reland of https://github.com/llvm/llvm-project/pull/158460

Test failures are gone once I undo the changes in codegenprepare.
2025-09-16 20:33:29 +00:00
Mingming Liu
9277bcd1ab
Revert "[NFCI][Globals] In GlobalObjects::setSectionPrefix, do conditional update if existing prefix is not equivalent to the new one. Returns whether prefix changed." (#159159)
Reverts llvm/llvm-project#158460 due to buildbot failures
2025-09-16 12:51:54 -07:00
Mingming Liu
027bccc469
[NFCI][Globals] In GlobalObjects::setSectionPrefix, do conditional update if existing prefix is not equivalent to the new one. Returns whether prefix changed. (#158460)
Before this change, `setSectionPrefix` overwrites existing section
prefix with new one unconditionally.

After this change, `setSectionPrefix` checks for equivalences, updates
conditionally and returns whether an update happens.

Update the existing callers to make use of the return value. [PR
155337](https://github.com/llvm/llvm-project/pull/155337/files#diff-cc0c67ac89807f4453f0cfea9164944a4650cd6873a468a0f907e7158818eae9)
is a motivating use case whether the 'update' semantic is needed.
2025-09-16 12:01:21 -07:00
Craig Topper
f209d63b04
[SelectionDAGBuilder][PPC] Use getShiftAmountConstant. (#158400)
The PowerPC changes are caused by shifts created by different IR
operations being CSEd now. This allows consecutive loads to be turned
into vectors earlier. This has effects on the ordering of other combines
and legalizations. This leads to some improvements and some regressions.
2025-09-16 10:26:49 -07:00
guan jian
6aab826e23
[DAGCombiner] add fold (xor (smin(x, C), C)) and fold (xor (smax(x, C), C)) (#155141)
Hi, I compared the following LLVM IR with GCC and Clang, and there is a small difference between the two. The LLVM IR is:
```
define i64 @test_smin_neg_one(i64 %a) {
  %1 = tail call i64 @llvm.smin.i64(i64 %a, i64 -1)
  %retval.0 = xor i64 %1, -1
  ret i64 %retval.0
}
```
GCC generates:
```
	cmp	x0, 0
	csinv	x0, xzr, x0, ge
	ret
```
Clang generates:
```
	cmn	x0, #1
	csinv	x8, x0, xzr, lt
	mvn	x0, x8
	ret
```
Clang keeps flipping x0 through x8 unnecessarily.
So I added the following folds to DAGCombiner:
fold (xor (smax(x, C), C)) -> select (x > C), xor(x, C), 0
fold (xor (smin(x, C), C)) -> select (x < C), xor(x, C), 0

alive2: https://alive2.llvm.org/ce/z/gffoir

---------

Co-authored-by: Yui5427 <785369607@qq.com>
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-09-16 15:30:57 +00:00
zhijian lin
2771d35e2a
[NFC ]Add a helper function isTailCall for getting libcall in SelectionDAG (#155256)
Based on comment of
https://github.com/llvm/llvm-project/pull/153600#discussion_r2285729269,
Add a helper function isTailCall for getting libcall in SelectionDAG.
2025-09-16 10:17:29 -04:00
Paul Walker
aa1a694846
[LLVM][GlobalISel] Make CSEMIRBuilder::buildConstant scalable vector aware. (#158299) 2025-09-16 11:44:26 +01:00
Matt Arsenault
ea9acc97f1
CodeGen: Surface shouldRewriteCopySrc utility function (#158524)
Change shouldRewriteCopySrc to return the common register
class and expose it as a utility function. I've found myself
reproducing essentially the same logic in multiple places. The
purpose of this function is to jsut work through the API constraints
of which combination of register class and subreg indexes you have.

i.e. you need to use a different function if you have 0, 1, or 2
subregister indexes involved in a pair of copy-like operations.
2025-09-16 14:53:49 +09:00
Craig Topper
9bedece621 [LegalizeTypes] Use correct type for constant in PromoteIntRes_FunnelShift.
This is a typo from #158553. We should use AmtVT instead of VT.

I guess VT and AmtVT are always the same at this point for tested
targets.
2025-09-15 15:54:26 -07:00
Craig Topper
bc745dcd78
[LegalizeTypes] Use getShiftAmountConstant in PromoteIntRes_FunnelShift. (#158553) 2025-09-15 10:29:19 -07:00
David Green
1c21d5cb9b
[GlobalISel] Remove GI known bits cache (#157352)
There is a cache on the known-bit computed by global-isel. It only works
inside a single query to computeKnownBits, which limits its usefulness,
and according to the tests can sometimes limit the effectiveness of
known-bits queries. (Although some AMD tests look longer). Keeping the
cache valid and clearing it at the correct times can also require being
careful about the functions called inside known-bits queries.

I measured compile-time of removing it and came up with:
```
7zip      2.06405E+11     2.06436E+11     0.015018992
Bullet    1.01298E+11     1.01186E+11     -0.110236169
ClamAV    57942466667     57848066667     -0.16292023
SPASS     45444466667     45402966667     -0.091320249
consumer  35432466667     35381233333     -0.144594317
kimwitu++ 40858833333     40927933333     0.169118877
lencod    70022366667     69950633333     -0.102443457
mafft     38439900000     38413233333     -0.069372362
sqlite3   35822266667     35770033333     -0.145812474
tramp3d   82083133333     82045600000     -0.045726
Average                                   -0.068828739
```
The last column is % difference between with / without the cache. So in
total it seems to be costing slightly more to keep the current
known-bits cache than if it was removed. (Measured in instruction count,
similar to llvm-compile-time-tracker).

The hit rate wasn't terrible - higher than I expected. In the
llvm-test-suite+external projects it was hit 4791030 times out of
91107008 queries, slightly more than 5%.

Note that as globalisel increases in complexity, more known bits calls
might be made and the numbers might shift. If that is the case it might
be better to have a cache that works across calls, providing it doesn't
make effectiveness worse.
2025-09-15 07:32:00 +01:00
Craig Topper
4cbf4408e7
[SelectionDAG] Use getShiftAmountConstant. (#158395)
Many of the shifts in LegalizeIntegerTypes.cpp were using getPointerTy.
2025-09-12 19:49:48 -07:00
Craig Topper
4ebd202329
[LegalizeTypes][X86] Use getShiftAmountConstant in ExpandIntRes_SIGN_EXTEND. (#158388)
This ensures we don't need to fixup the shift amount later.

Unfortunately, this enabled the
(SRA (SHL X, ShlConst), SraConst) -> (SRA (sext_in_reg X), SraConst -
ShlConst) combine in combineShiftRightArithmetic for some cases in
is_fpclass-fp80.ll. So we need to also update checkSignTestSetCCCombine
to look through sign_extend_inreg to prevent a regression.
2025-09-12 19:49:29 -07:00
Craig Topper
0ca54d7738
[LegalizeTypes] Use getShiftAmountConstant in SplitInteger. (#158392)
This function contained old code for handling the case that the type
returned getScalarShiftAmountTy can't hold the shift amount.

These days this is handled by getShiftAmountTy which is used by
getShiftAmountConstant.
2025-09-12 18:54:48 -07:00
Afanasyev Ivan
ffcaeca90a
[CodeGen] Fix partial phi input removal in TailDuplicator. (#158265)
Tail duplicator removes the first PHI income from the predecessor basic
block, while it should remove all operands for this block.

PHI instructions happen to have duplicated values for the same
predecessor block:
* `UnreachableMachineBlockElim` assumes that PHI instruction might have
duplicates:
7289f2cd0c/llvm/lib/CodeGen/UnreachableBlockElim.cpp (L160)
* `AArch64` directly states that PHI instruction might have duplicates:
7289f2cd0c/llvm/lib/Target/AArch64/AArch64ConditionalCompares.cpp (L244)
* And `Hexagon`:
7289f2cd0c/llvm/lib/Target/Hexagon/HexagonConstPropagation.cpp (L844)

We have caught the bug on custom out-of-tree backend. `TailDuplicator`
should remove all operands corresponding to the removing block.

Please note, that bug likely does not affect in-tree backends, because:
* It happens only in scenario of **partial** tail duplication (i.e. tail
block is duplicated in some predecessors, but not in all of them)
* It happens in **Pre-RA** tail duplication only (Post-RA does not
contain PHIs, obviously)
* The only backend (I know) uses Pre-RA tail duplication is X86. It uses
tail duplication via `early-tailduplication` pass which declines partial
tail duplication via `canCompletelyDuplicateBB` check, because it uses
`TailDuplicator::tailDuplicateBlocks` public API.

So, bug happens only in the case of pre-ra partial tail duplication if
backend uses `TailDuplicator::tailDuplicate` public API directly.

That's why I can not add reproducer test for in-tree backends.
2025-09-13 10:45:54 +09:00
Craig Topper
f32874f77b [LegalizeIntegerTypes] Use getShiftAmountConstant. 2025-09-12 16:10:01 -07:00
Matt Arsenault
7289f2cd0c
CodeGen: Remove MachineFunction argument from getRegClass (#158188)
This is a low level utility to parse the MCInstrInfo and should
not depend on the state of the function.
2025-09-12 19:22:02 +09:00
Matt Arsenault
2331fbb019
CodeGen: Remove MachineFunction argument from getPointerRegClass (#158185)
getPointerRegClass is a layering violation. Its primary purpose
is to determine how to interpret an MCInstrDesc's operands RegClass
fields. This should be context free, and only depend on the subtarget.
The model of this is also wrong, since this should be an
instruction / operand specific property, not a global pointer class.
Remove the the function argument to help stage removal of this hook
and avoid introducing any new obstacles to replacing it.

The remaining uses of the function were to get the subtarget, which
TargetRegisterInfo already belongs to. A few targets needed new
subtarget derived properties copied there.
2025-09-12 09:18:50 +00:00
Owen Anderson
0f13cae7ff
[CodeGen, CHERI] Add capability types to MVT. (#156616)
This adds value types for representing capability types, enabling their use in instruction selection and other parts of the backend.

These types are distinguished from each other only by size. This is sufficient, at least today, because no existing CHERI configuration supports multiple capability sizes simultaneously. Hybrid configurations supporting intermixed integral pointers and capabilities do exist, and are one of the reasons why these value types are needed beyond existing integral types.

Co-authored-by: David Chisnall <theraven@theravensnest.org>
Co-authored-by: Jessica Clarke <jrtc27@jrtc27.com>
2025-09-11 17:44:30 +08:00
Yi-Chi Lee
0c6141a07a
[GlobalISel] Add computeNumSignBits for SHL (#152067)
This patch ports the `ISD::SHL` handling from SelectionDAG’s
`ComputeNumSignBits` to GlobalISel.

Related to #150515.
2025-09-11 16:00:30 +09:00
Abhishek Kaushik
1278ac71d3
[NFC][GlobalISel] Pass APInt by const reference (#157827)
Change `SpecificConstantMatch` constructor and `isBuildVectorConstantSplat` overloads to take `const APInt&` instead of by value to avoid unnecessary copies, especially for wide integers.
2025-09-11 11:11:14 +05:30
Shaoce SUN
41d7ae84e5
[RISCV][GlobalIsel] Lower G_FMINIMUMNUM, G_FMAXIMUMNUM (#157295)
Similar to the implementation in
https://github.com/llvm/llvm-project/pull/104411 , the `fmin.s`/`fmax.s`
instructions follow IEEE 754-2019 semantics, and
`G_FMINIMUMNUM`/`G_FMAXIMUMNUM` are legal.
2025-09-11 10:16:42 +08:00
woruyu
c69172637e
[RISCV][GISel] Lower G_SADDE (#156865)
### Summary
Try to implemente Lower G_SADDE in LegalizerHelper::lower
2025-09-11 09:32:56 +08:00
Craig Topper
8f8429540e
[ExpandVectorPredication] Keep the original value name when expanding predicated instructions. (#157943)
This makes it easier to follow a value through the pass. If we pass the
original name to the create function, a number will be added as a suffix
since the original name is still used until it is replaced.
2025-09-10 16:18:11 -07:00
Arthur Eubanks
984251acad
Revert "[DAGCombiner] Relax condition for extract_vector_elt combine" (#157953)
Reverts llvm/llvm-project#157658

Causes hangs, see
https://github.com/llvm/llvm-project/pull/157658#issuecomment-3276441812
2025-09-10 21:33:44 +00:00
Craig Topper
262c7b7b5a
[RISCV][GISel] Widen G_ABDS/G_ABDU before lowering when Zbb is enabled. (#157766)
This allows us to use G_SMIN/SMAX/UMIN/UMAX in the lowering.
2025-09-10 12:17:30 -07:00
Craig Topper
397e5a457a
[ExpandVectorPredication] Expand vp_merge and vp_select in expandPredication. (#157777) 2025-09-10 08:50:30 -07:00
jyli0116
619d36ff4f
[GISel] Combine shift + trunc + shift pattern (#155583)
Folds shift(trunc(shift(...))) pattern into trunc(shift(...)) by
combining the two shift instructions
2025-09-10 15:01:55 +01:00
Jay Foad
349544d7ab
[CodeGen] Fix handling dead redefs in finalizeBundle (#157427)
A dead redefinition should override any earlier non-dead definition
inside a bundle.

Also remove KilledDefSet since it can be folded into DeadDefSet.
2025-09-10 12:48:12 +01:00
Frederik Harwath
ffcf82c4a8
[AMDGPU] Change expand-fp opt level argument syntax (#157408)
Align the syntax used for the optimization level argument of the
expand-fp pass in textual descriptions of pass pipelines with the syntax
used by other passes taking a similar argument. That is, use e.g.
`expand-fp<O1>` instead of `expand-fp<opt-level=1>`.
2025-09-10 10:44:28 +02:00
ZhaoQi
4621e17dee
[DAGCombiner] Relax condition for extract_vector_elt combine (#157658)
Checking `isOperationLegalOrCustom` instead of `isOperationLegal` allows
more optimization opportunities. In particular, if a target wants to
mark `extract_vector_elt` as `Custom` rather than `Legal` in order to
optimize some certain cases, this combiner would otherwise miss some
improvements.

Previously, using `isOperationLegalOrCustom` was avoided due to the risk
of getting stuck in infinite loops (as noted in
61ec738b60).
After testing, the issue no longer reproduces, but the coverage is
limited to the regression/unit tests and the test-suite.
2025-09-10 15:51:52 +08:00