38246 Commits

Author SHA1 Message Date
Ye Tian
db843e5d09
[DAG] Add ISD::FP_TO_SINT_SAT/FP_TO_UINT_SAT handling to SelectionDAG::canCreateUndefOrPoison (#154244)
Related to https://github.com/llvm/llvm-project/issues/153366
2025-08-19 10:45:11 +09:00
黃國庭
0773854575
[DAG] Fold trunc(avg(x,y)) for avgceil/floor u/s nodes if they have sufficient leading zero/sign bits (#152273)
avgceil version :  https://alive2.llvm.org/ce/z/2CKrRh  

Fixes #147773 

---------

Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-08-18 16:36:26 +01:00
David Green
03912a1de5
[GlobalISel] Translate scalar sequential vecreduce.fadd/fmul as fadd/fmul. (#153966)
A llvm.vector.reduce.fadd(float, <1 x float>) will be translated to
G_VECREDUCE_SEQ_FADD with two scalar operands, which is illegal
according to the verifier. This makes sure we generate a fadd/fmul
instead.
2025-08-18 14:59:44 +00:00
Kazu Hirata
07eb7b7692
[llvm] Replace SmallSet with SmallPtrSet (NFC) (#154068)
This patch replaces SmallSet<T *, N> with SmallPtrSet<T *, N>.  Note
that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer
element types:

  template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};

We only have 140 instances that rely on this "redirection", with the
vast majority of them under llvm/. Since relying on the redirection
doesn't improve readability, this patch replaces SmallSet with
SmallPtrSet for pointer element types.
2025-08-18 07:01:29 -07:00
Simon Pilgrim
858d1dfa2c
[DAG] visitTRUNCATE - early out from computeKnownBits/ComputeNumSignBits failures. NFC. (#154111)
Avoid unnecessary (costly) computeKnownBits/ComputeNumSignBits calls - use MaskedValueIsZero instead of computeKnownBits directly to simplify code.
2025-08-18 14:55:09 +01:00
jofrn
e8e3e6e893
[LiveVariables] Mark use without implicit if defined at instr (#119446)
LiveVariables will mark instructions with their implicit subregister
uses. However, it will also mark the subregister as an implicit if its
own definition is a subregister of it, i.e. `$r3 = OP val, implicit-def
$r0_r1_r2_r3, ..., implicit $r2_r3`, even if it is otherwise unused,
which defines $r3 on the same line it is used.

This change ensures such uses are marked without implicit, i.e. `$r3 =
OP val, implicit-def $r0_r1_r2_r3, ..., $r2_r3`.

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-08-18 08:34:59 -04:00
Simon Pilgrim
681ecae913
[DAG] visitTRUNCATE - test abd legality early to avoid unnecessary computeKnownBits/ComputeNumSignBits calls. NFC. (#154085)
isOperationLegal is much cheaper than value tracking
2025-08-18 11:06:29 +01:00
Matt Arsenault
53e9d3247e
DAG: Remove unnecessary getPointerTy call (#154055)
getValueType already did this
2025-08-18 17:12:16 +09:00
Nikita Popov
238c3dcd0d
[CodeGen][Mips] Remove fp128 libcall list (#153798)
Mips requires fp128 args/returns to be passed differently than i128. It
handles this by inspecting the pre-legalization type. However, for soft
float libcalls, the original type is currently not provided (it will
look like a i128 call). To work around that, MIPS maintains a list of
libcalls working on fp128.

This patch removes that list by providing the original, pre-softening
type to calling convention lowering. This is done by carrying additional
information in CallLoweringInfo, as we unfortunately do need both types
(we want the un-softened type for OrigTy, but we need the softened type
for the actual register assignment etc.)

This is in preparation for completely removing all the custom
pre-analysis code in the Mips backend and replacing it with use of
OrigTy.
2025-08-18 09:22:41 +02:00
Matt Arsenault
3e5d8a1439 Reapply "RuntimeLibcalls: Generate table of libcall name lengths (#153… (#153864)
This reverts commit 334e9bf2dd01fbbfe785624c0de477b725cde6f2.

Check if llvm-nm exists before building the benchmark.
2025-08-16 09:53:50 +09:00
gulfemsavrun
334e9bf2dd
Revert "RuntimeLibcalls: Generate table of libcall name lengths (#153… (#153864)
…210)"

This reverts commit 9a14b1d254a43dc0d4445c3ffa3d393bca007ba3.

Revert "RuntimeLibcalls: Return StringRef for libcall names (#153209)"

This reverts commit cb1228fbd535b8f9fe78505a15292b0ba23b17de.

Revert "TableGen: Emit statically generated hash table for runtime
libcalls (#150192)"

This reverts commit 769a9058c8d04fc920994f6a5bbb03c8a4fbcd05.

Reverted three changes because of a CMake error while building llvm-nm
as reported in the following PR:
https://github.com/llvm/llvm-project/pull/150192#issuecomment-3192223073
2025-08-15 13:32:27 -07:00
Min-Yih Hsu
0e9b6d6c8a
[IA][RISCV] Detecting gap mask from a mask assembled by interleaveN intrinsics (#153510)
If the mask of a (fixed-vector) deinterleaved load is assembled by
`vector.interleaveN` intrinsic, any intrinsic arguments that are
all-zeros are regarded as gaps.
2025-08-15 09:22:47 -07:00
Craig Topper
853094fd81 [VirtRegMap] Use TRI member variable. NFC 2025-08-15 09:14:09 -07:00
Nikita Popov
01bc742185
[CodeGen] Give ArgListEntry a proper constructor (NFC) (#153817)
This ensures that the required fields are set, and also makes the
construction more convenient.
2025-08-15 18:06:07 +02:00
Nikita Popov
11c2240049 [SDAGBuilder] Rename RetTys -> RetVTs (NFC)
Make it clearer that this is a vector of EVTs, not IR types.

Based on:
https://github.com/llvm/llvm-project/pull/153798#discussion_r2279066696
2025-08-15 17:06:33 +02:00
Philip Reames
606937474e
[SDAG] Remove IndexType manipulation in getUniformBase and callers (#151578)
All paths set it to the same value, just propagate that value to the
consumer.
2025-08-15 08:00:47 -07:00
Simon Pilgrim
bcb4984a0b [X86] select-smin-smax.ll - add i128 tests
Helps check quality of legality codegen (all we had was x86 i64 handling)
2025-08-15 13:48:13 +01:00
Diana Picus
ac005e16f6
Reapply "[AMDGPU] Intrinsic for launching whole wave functions" (#153584)
This reverts commit 14cd1339318b16e08c1363ec6896bd7d1e4ae281. The
buildbot failure seems to have been a cmake issue which has been
discussed in more detail in this Discourse post:

https://discourse.llvm.org/t/cmake-doesnt-regenerate-all-tablegen-target-files/87901

If any buildbots fail to select arbitrary intrinsics with this patch,
it's worth considering using clean builds with ccache instead of
incremental builds, as recommended here:

https://llvm.org/docs/HowToAddABuilder.html#:~:text=Use%20CCache%20and%20NOT%20incremental%20builds

The original commit message for this patch:
Add the llvm.amdgcn.call.whole.wave intrinsic for calling whole wave
functions. This will take as its first argument the callee with the
amdgpu_gfx_whole_wave calling convention, followed by the call
parameters which must match the signature of the callee except for the
first function argument (the i1 original EXEC mask, which doesn't need
to be passed in). Indirect calls are not allowed.

Make direct calls to amdgpu_gfx_whole_wave functions a verifier error.

Tail calls are handled in a future patch.
2025-08-15 10:12:47 +02:00
Cullen Rhodes
393c21137e
[StatepointLowering] Use FrameIndex instead of TargetFrameIndex (#153555)
TargetFrameIndex shouldn't be used as an operand to target-independent
node such as a load. This causes ISel issues.

#81635 fixed a similar issue with this code using a TargetConstant,
instead of a Constant.

Fixes #142314.
2025-08-15 08:02:31 +01:00
Matt Arsenault
cb1228fbd5
RuntimeLibcalls: Return StringRef for libcall names (#153209)
Does not yet fully propagate this down into the TargetLowering
uses, many of which are relying on null checks on the returned
value.
2025-08-15 09:55:39 +09:00
Kyungwoo Lee
07d3a73d70 Revert "[CGData] Lazy loading support for stable function map (#151660)"
This reverts commit 76dd742f7b32e4d3acf50fab1dbbd897f215837e.
2025-08-14 16:56:54 -07:00
Min-Yih Hsu
abe92a5000
[DAGCombine] Fix an incorrect folding of extract_subvector (#153709)
Reported from
https://github.com/llvm/llvm-project/pull/153393#issuecomment-3189898813

During DAGCombine, an intermediate extract_subvector sequence was
generated:
```
  t8: v9i16 = extract_subvector t3, Constant:i64<9>
t24: v8i16 = extract_subvector t8, Constant:i64<0>
```
And one of the DAGCombine rule which turns `(extract_subvector
(extract_subvector X, C), 0)` into `(extract_subvector X, C)` kicked in
and turn that into `v8i16 = extract_subvector t3, Constant:i64<9>`. But
it forgot to check if the extracted index is a multiple of the minimum
vector length of the result type, hence the crash.

This patch fixes this by adding an additional check.
2025-08-14 23:37:22 +00:00
Zhaoxuan Jiang
76dd742f7b
[CGData] Lazy loading support for stable function map (#151660)
The stable function map could be huge for a large application. Fully
loading it is slow and consumes a significant amount of memory, which is
unnecessary and drastically slows down compilation especially for
non-LTO and distributed-ThinLTO setups. This patch introduces an opt-in
lazy loading support for the stable function map. The detailed changes
are:

- `StableFunctionMap`
- The map now stores entries in an `EntryStorage` struct, which includes
offsets for serialized entries and a `std::once_flag` for thread-safe
lazy loading.
- The underlying map type is changed from `DenseMap` to
`std::unordered_map` for compatibility with `std::once_flag`.
- `contains()`, `size()` and `at()` are implemented to only load
requested entries on demand.

- Lazy Loading Mechanism
- When reading indexed codegen data, if the newly-introduced
`-indexed-codegen-data-lazy-loading` flag is set, the stable function
map is not fully deserialized up front. The binary format for the stable
function map now includes offsets and sizes to support lazy loading.
- The safety of lazy loading is guarded by the once flag per function
hash. This guarantees that even in a multi-threaded environment, the
deserialization for a given function hash will happen exactly once. The
first thread to request it performs the load, and subsequent threads
will wait for it to complete before using the data. For single-threaded
builds, the overhead is negligible (a single check on the once flag).
For multi-threaded scenarios, users can omit the flag to retain the
previous eager-loading behavior.
2025-08-14 13:49:09 -07:00
Min-Yih Hsu
c202d2f515
[IA][RISCV] Recognizing gap masks assembled from bitwise AND (#153324)
For a deinterleaved masked.load / vp.load, if it's mask, `%c`, is
synthesized by the following snippet:
```
%m = shufflevector %s, poison, <0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3>
%g = <1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0>
%c = and %m, %g
```
Then we can know that `%g` is the gap mask and `%s` is the mask for each
field / component. This patch teaches InterleaveAccess pass to recognize
such patterns
2025-08-14 11:17:50 -07:00
Nikita Popov
16ad20291d [TargetLowering] Store Context in variable (NFC)
Avoid repeating CLI.RetTy->getContext() many times.
2025-08-14 17:13:31 +02:00
Björn Pettersson
5e7924a3cb
[SelectionDAG] Handle more opcodes in isGuaranteedNotToBeUndefOrPoison (#147019)
Add special handling of EXTRACT_SUBVECTOR, INSERT_SUBVECTOR,
EXTRACT_VECTOR_ELT, INSERT_VECTOR_ELT and SCALAR_TO_VECTOR in
isGuaranteedNotToBeUndefOrPoison. Make use of DemandedElts to improve
the analysis and only check relevant elements for each operand.

Also start using DemandedElts in the recursive calls that check
isGuaranteedNotToBeUndefOrPoison for all operands for operations that do
not create undef/poison. We can do that for a number of elementwise
operations for which the DemandedElts can be applied to every operand
(e.g. ADD, OR, BITREVERSE, TRUNCATE).
2025-08-14 09:05:15 +00:00
Nikita Popov
d1952baa5d [CodeGen] Remove unnecessary setTypeListBeforeSoften() parameter (NFC)
It does not make sense to set the softening type list without
setting IsSoften=true.
2025-08-14 10:04:56 +02:00
XChy
f393f2a61e
[BranchFolding] Avoid moving blocks to fall through to an indirect target (#152916)
Depend on #152591 to fix
https://github.com/llvm/llvm-project/issues/149023.
Similar to an EH pad, there is no real advantage in "falling through" to
an indirect target of an INLINEASM_BR. And multiple indirect targets of
inline asm at the end of a function may be rotated infinitely.
Therefore, this patch avoids such optimization on indirect target of
inline asm as fall through.
2025-08-14 16:18:36 +09:00
Craig Topper
9f96e3f80f
[SelectionDAG] Pass SDValue to InstrEmitter::EmitCopyFromReg. NFC (#153485)
Instead of passing SDNode and ResNo separately.

This allows us to use SDValue::operator== and avoid creating SDValue
from the operands inside the function.
2025-08-13 21:46:48 -07:00
Carl Ritson
e4fd6ba682
[PHIElimination] Preserve MachinePostDominatorTree (#153346)
Minor changes to allow preservation of post dominator tree through PHI
elimination pass.
Also remove duplicate retrieval of dominator tree analysis.

This is a speculative change to support reworking on passes in AMDGPU
backend.
2025-08-14 10:50:46 +09:00
David Green
c5105c1e0a
[GlobalISel] Fix bitcast fewerElements with scalar narrow types. (#153364)
For a <8 x i32> -> <2 x i128> bitcast, that under aarch64 is split into
two halfs, the scalar i128 remainder was causing problems, causing a
crash with invalid vector types. This makes sure they are handled
correctly in fewerElementsBitcast.
2025-08-13 22:27:53 +01:00
Amy Kwan
63cc2e390d
[PowerPC][CodeGen] Expand ISD::AssertNoFPClass for ppc_fp128 (#152357)
780054d3ff18075a6bc433029f336931792b1d2d added support for
`ISD::AssertNoFPClass`.

This ISD node can be used with the `ppc_fp128` type, which is really
just two `f64s` and requires expanding when used with
`ISD::AssertNoFPClass`. Without the support for expanding the result, we
get an assertion because the legalizer does not know how to expand the
results of `ppc_fp128` with `ISD::AssertNoFPClass`.
```
ExpandFloatResult #0: t7: ppcf128 = AssertNoFPClass t5, TargetConstant:i32<3>

LLVM ERROR: Do not know how to expand the result of this operator!
```
Thus, this patch aims to add support for the expand so we no longer
assert.

This fixes #151375.
2025-08-13 15:00:32 -04:00
Nikita Popov
498ef361fe
[CodeGen] Make OrigTy in CC lowering the non-aggregate type (#153414)
https://github.com/llvm/llvm-project/pull/152709 exposed the original IR
argument type to the CC lowering logic. However, in SDAG, this used the
raw type, prior to aggregate splitting. This PR changes it to use the
non-aggregate type instead. (This matches what happened in the
GlobalISel case already.)

I've also added some more detailed documentation on the
InputArg/OutputArg fields, to explain how they differ. In most cases
ArgVT is going to be the EVT of OrigTy, so they encode very similar
information (OrigTy just preserves some additional information lost in
EVTs, like pointer types). One case where they do differ is in
post-legalization lowering of libcalls, where ArgVT is going to be a
legalized type, while OrigTy is going to be the original non-legalized
type.
2025-08-13 18:42:26 +02:00
Nikita Popov
240c454c4d
[CodeGen] Remove default ctors for InputArg and OutputArg (#153205)
These make it easy to forget to initialize some members, like the newly
added OrigTy. Force these to always go through the ctor instead.
2025-08-13 10:51:43 +02:00
Matt Arsenault
db126d8004
CodeGen: Make MachineFunction's subtarget member a reference (#153352) 2025-08-13 16:22:32 +09:00
Shoreshen
db96363c0a
[AMDGPU] Avoid put implicit_def into bundle that break reg's liveness (#142563)
Cause:
1. `implicit_def` inside bundle does not count for define of reg in
machineinst verifier
2. Including `implicit_def` will cause relative reg not define, result
in `Bad machine code: Using an undefined physical register` in the
machineinst verifier

Fixes https://github.com/llvm/llvm-project/issues/139102

---------

Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
2025-08-13 10:41:44 +08:00
Philip Reames
49b17a0c1c
[MIR] Further cleanup on mutliple save/restore point support [nfc] (#153250)
Remove the type alias now that the std::variant aspect is gone, directly
using std::vector in the few places that need it is more idiomatic.

Move a routine from a core header to single user.
2025-08-12 14:16:41 -07:00
Min-Yih Hsu
ca05058b49
[IA][RISCV] Recognize deinterleaved loads that could lower to strided segmented loads (#151612)
Turn the following deinterleaved load patterns
```
%l = masked.load(%ptr, /*mask=*/110110110110, /*passthru=*/poison)
%f0 = shufflevector %l, [0, 3, 6, 9]
%f1 = shufflevector %l, [1, 4, 7, 10]
%f2 = shufflevector %l, [2, 5, 8, 11]
```
into
```
%s = riscv.vlsseg2(/*passthru=*/poison, %ptr, /*mask=*/1111)
%f0 = extractvalue %s, 0
%f1 = extractvalue %s, 1
%f2 = poison
```
The mask `110110110110` is regarded as 'gap mask' since it effectively skips the entire third field / component.

Similarly,  turning the following snippet
```
%l = masked.load(%ptr, /*mask=*/110000110000, /*passthru=*/poison)
%f0 = shufflevector %l, [0, 3, 6, 9]
%f1 = shufflevector %l, [1, 4, 7, 10]
```
into
```
%s = riscv.vlsseg2(/*passthru=*/poison, %ptr, /*mask=*/1010)
%f0 = extractvalue %s, 0
%f1 = extractvalue %s, 1
```

Right now this patch only tries to detect gap mask from a constant mask supplied to a masked.load/vp.load.
2025-08-12 14:08:18 -07:00
Philip Reames
4d629f9744
[MIR] Remove std::variant from multiple save/restore point handling [nfc] (#153226)
In review of bbde6b, I had originally proposed that we support the
legacy text format. As review evolved, it bacame clear this had been a
bad idea (too much complexity), but in order to let that patch finally
move forward, I approved the change with the variant. This change undoes
the variant, and updates all the tests to just use the array form.
2025-08-12 11:23:05 -07:00
Daniel Paoliello
c430e06fb5
[win][arm64ec] Fix duplicate errors with the dontcall attribute (#152810)
Since the `dontcall-*` attributes are checked both by
`FastISel`/`GlobalISel` and `SelectionDAGBuilder`, and both `FastISel`
and `GlobalISel` bail for calls on Arm64EC for AFTER doing the check, we
ended up emitting duplicate copies of this error.

This change moves the checking for `dontcall-*` in `FastISel` and
`GlobalISel` to after it has been successfully lowered.
2025-08-12 11:05:07 -07:00
Elizaveta Noskova
bbde6be841
[llvm] Support multiple save/restore points in mir (#119357)
Currently mir supports only one save and one restore point
specification:

```
  savePoint:       '%bb.1'
  restorePoint:    '%bb.2'
```

This patch provide possibility to have multiple save and multiple
restore points in mir:

```
  savePoints:
    - point:           '%bb.1'
  restorePoints:
    - point:           '%bb.2'
```

Shrink-Wrap points split Part 3.
RFC:
https://discourse.llvm.org/t/shrink-wrap-save-restore-points-splitting/83581

Part 1: https://github.com/llvm/llvm-project/pull/117862
Part 2: https://github.com/llvm/llvm-project/pull/119355
Part 4: https://github.com/llvm/llvm-project/pull/119358
Part 5: https://github.com/llvm/llvm-project/pull/119359
2025-08-12 16:34:29 +03:00
XChy
2a49719525
[SelectionDAGBuilder] Look for appropriate INLINEASM_BR instruction to verify (#152591)
Partially fix #149023.
The original code `MRI.def_begin(Reg)->getParent()` may return the
incorrect MI, as the physical register `Reg` may have multiple
definitions.
This patch selects the correct MI to verify by comparing the MBB of each
definition.

New testcase hangs with -O1/2/3 enabled. The BranchFolding may be to
blame.
2025-08-12 12:37:56 +00:00
David Sherwood
7f763d9b48
[AArch64] Support symmetric complex deinterleaving with higher factors (#151295)
For loops such as this:

```
struct foo {
  double a, b;
};

void foo(struct foo *dst, struct foo *src, int n) {
  for (int i = 0; i < n; i++) {
    dst[i].a += src[i].a * 3.2;
    dst[i].b += src[i].b * 3.2;
  }
}
```

the complex deinterleaving pass will spot that the deinterleaving
associated with the structured loads cancels out the interleaving
associated with the structured stores. This happens even though
they are not truly "complex" numbers because the pass can handle
symmetric operations too. This is great because it means we can
then perform normal loads and stores instead. However, we can also
do the same for higher interleave factors, e.g. 4:

```
struct foo {
  double a, b, c, d;
};

void foo(struct foo *dst, struct foo *src, int n) {
  for (int i = 0; i < n; i++) {
    dst[i].a += src[i].a * 3.2;
    dst[i].b += src[i].b * 3.2;
    dst[i].c += src[i].c * 3.2;
    dst[i].d += src[i].d * 3.2;
  }
}
```

This PR extends the pass to effectively treat such structures as
a set of complex numbers, i.e.

```
struct foo_alt {
  std::complex<double> x, y;
};
```

with equivalence between members:

```
  foo_alt.x.real == foo.a
  foo_alt.x.imag == foo.b
  foo_alt.y.real == foo.c
  foo_alt.y.imag == foo.d
```

I've written the code to handle sets with arbitrary numbers of
complex values, but since we only support interleave factors
between 2 and 4 I've restricted the sets to 1 or 2 complex
numbers. Also, for now I've restricted support for interleave
factors of 4 to purely symmetric operations only. However, it
could also be extended to handle complex multiplications,
reductions, etc.

Fixes: https://github.com/llvm/llvm-project/issues/144795
2025-08-12 11:05:15 +01:00
Seraphimt
296e057d0b
[DAG] SelectionDAG::canCreateUndefOrPoison - add ISD::FMA/FMAD + tests (#152187)
In SelectionDAG::canCreateUndefOrPoison add case ISD::FMA/FMAD + tests.
Fixing #147693

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-08-12 17:17:46 +09:00
Benjamin Maxwell
d0c9599c41
[AArch64][SME] Use entry pstate.sm for conditional streaming-mode changes (#152169)
We only do conditional streaming mode changes in two cases:

- Around calls in streaming-compatible functions that don't have a
streaming body
- At the entry/exit of streaming-compatible functions with a streaming
body

In both cases, the condition depends on the entry pstate.sm value. Given
this, we don't need to emit calls to __arm_sme_state at every mode
change.

This patch handles this by placing a "AArch64ISD::ENTRY_PSTATE_SM" node
in the entry block and copying the result to a register. The register is
then used whenever we need to emit a conditional streaming mode change.
The "ENTRY_PSTATE_SM" node expands to a call to "__arm_sme_state" only
if (after SelectionDAG) the function is determined to have
streaming-mode changes.

This has two main advantages:

1. It allows back-to-back conditional smstart/stop pairs to be folded
2. It has the correct behaviour for EH landing pads
- These are entered with pstate.sm = 0, and should switch mode based on
the entry pstate.sm
   - Note: This is not fully implemented yet
2025-08-12 09:15:30 +01:00
Stephen Long
19ada02086
PreISelIntrinsicLowering: Lower llvm.log to a loop if scalable vec arg (#129744)
Similar to ab976a1, but for llvm.log.
2025-08-12 01:04:28 +09:00
Wesley Wiser
40a469f79a
Reapply "[X86] Correct 32-bit immediate assertion and fix 64-bit lowering for huge frame offsets" (#152239)
The first commit is identical to
69bec0afbb8f2aa0021d18ea38768360b16583a9.

The second commit fixes the instruction verification failures by
replacing the erroneous instruction with a trap after the error is
reported and adds `-verify-machineinstrs` to the tests added in the
original PR to catch the issue sooner.

After that change, all tests pass with both
`LLVM_ENABLE_EXPENSIVE_CHECKS={On,Off}`.

cc @RKSimon @e-kud @phoebewang @arsenm as reviewers on the original PR
2025-08-11 21:23:44 +05:30
Fabian Ritter
96775e9229
[GISel] Handle Flags in G_PTR_ADD Combines (#152495)
So far, GlobalISel's G_PTR_ADD combines have ignored MIFlags like nuw, nusw,
and inbounds. That was in many cases unnecessarily conservative and in others
unsound, since reassociations re-used the existing G_PTR_ADD instructions
without invalidating their flags. This patch aims to improve that.

I've checked the transforms in this PR with Alive2 on corresponding middle-end
IR constructs.

A longer-term goal would be to encapsulate the logic that determines which
GEP/ISD::PTRADD/G_PTR_ADD flags can be preserved in which case, since this
occurs in similar forms in the middle end, the SelectionDAG combines, and the
GlobalISel combines here.

For SWDEV-516125.
2025-08-11 10:34:45 +02:00
Nikita Popov
e92b7e9641
[CodeGen] Provide original IR type to CC lowering (NFC) (#152709)
It is common to have ABI requirements for illegal types: For example,
two i64 argument parts that originally came from an fp128 argument may
have a different call ABI than ones that came from a i128 argument.

The current calling convention lowering does not provide access to this
information, so backends come up with various hacks to support it (like
additional pre-analysis cached in CCState, or bypassing the default
logic entirely).

This PR adds the original IR type to InputArg/OutputArg and passes it
down to CCAssignFn. It is not actually used anywhere yet, this just does
the mechanical changes to thread through the new argument.
2025-08-11 08:57:53 +02:00
Yingwei Zheng
62735d26b1
[DAGCombine] Correctly extend the constant RHS in TargetLowering::SimplifySetCC (#152862)
In https://github.com/llvm/llvm-project/pull/150270, when the predicate
is eq/ne and the trunc has only an nsw flag, the RHS is incorrectly
zero-extended.

Closes https://github.com/llvm/llvm-project/issues/152630.
2025-08-10 01:24:37 +08:00