700 Commits

Author SHA1 Message Date
woruyu
1a172b9924
[RISCV][GISel] Lower G_SSUBE (#157855)
### Summary
Try to implemente Lower G_SSUBE in LegalizerHelper::lower
2025-09-18 10:08:56 +08:00
Shaoce SUN
41d7ae84e5
[RISCV][GlobalIsel] Lower G_FMINIMUMNUM, G_FMAXIMUMNUM (#157295)
Similar to the implementation in
https://github.com/llvm/llvm-project/pull/104411 , the `fmin.s`/`fmax.s`
instructions follow IEEE 754-2019 semantics, and
`G_FMINIMUMNUM`/`G_FMAXIMUMNUM` are legal.
2025-09-11 10:16:42 +08:00
woruyu
c69172637e
[RISCV][GISel] Lower G_SADDE (#156865)
### Summary
Try to implemente Lower G_SADDE in LegalizerHelper::lower
2025-09-11 09:32:56 +08:00
Craig Topper
262c7b7b5a
[RISCV][GISel] Widen G_ABDS/G_ABDU before lowering when Zbb is enabled. (#157766)
This allows us to use G_SMIN/SMAX/UMIN/UMAX in the lowering.
2025-09-10 12:17:30 -07:00
Shaoce SUN
eb623e650b
[RISCV][GISel] Lower G_ABDS and G_ABDU (#155888)
Implementation follows the `ISD::ABDS` handling in
`RISCVTargetLowering`.
2025-09-05 21:16:35 +08:00
Amara Emerson
4829dedfa9
[GlobalISel] Add multi-way splitting support for wide scalar shifts. (#155353)
This patch implements direct N-way splitting for wide scalar shifts
instead
of recursive binary splitting. For example, an i512 G_SHL can now be
split
directly into 8 i64 operations rather than going through i256 -> i128 ->
i64.

The main motivation behind this is to alleviate (although not entirely
fix)
pathological compile time issues with huge types, like i4224. The
problem
we see is that the recursive splitting strategy combined with our messy
artifact combiner ends up with terribly long compiles as tons of
intermediate
artifacts are generated, and then attempted to be combined ad-nauseum.

Going directly from the large shifts to the destination types
short-circuits
a lot of these issues, but it's still an abuse of the backend and
front-ends
should never be doing this sort of thing.
2025-09-03 10:25:52 -07:00
Kane Wang
7d6e72f110
[RISCV][GlobalISel] Lower G_ATOMICRMW_SUB via G_ATOMICRMW_ADD (#155972)
RISCV does not provide a native atomic subtract instruction, so this
patch lowers `G_ATOMICRMW_SUB` by negating the RHS value and performing
an atomic add. The legalization rules in `RISCVLegalizerInfo` are
updated accordingly, with libcall fallbacks when `StdExtA` is not
available, and intrinsic legalization is extended to support
`riscv_masked_atomicrmw_sub`.

For example, lowering

`%1 = atomicrmw sub ptr %a, i32 1 seq_cst`

on riscv32a produces:

```
li      a1, -1
amoadd.w.aqrl   a0, a1, (a0)
```

On riscv64a, where the RHS type is narrower than XLEN, it currently
produces:

```
li      a1, 1
neg     a1, a1
amoadd.w.aqrl   a0, a1, (a0)
```

There is still a constant-folding or InstConbiner gap. For instance,
lowering

```
%b = sub i32 %x, %y
%1 = atomicrmw sub ptr %a, i32 %b seq_cst
```

generates:

```
subw    a1, a1, a2
neg     a1, a1
amoadd.w.aqrl   a0, a1, (a0)
```

This sequence could be optimized further to eliminate the redundant neg.
Addressing this may require improvements in the Combiner or Peephole
Optimizer in future work.

---------

Co-authored-by: Kane Wang <kanewang95@foxmail.com>
2025-09-03 08:42:31 -07:00
David Green
4ee80ca29e
[GlobalISel] Add support for scalarizing vector insert and extract elements (#153274)
This Adds scalarization handling for fewer vector elements of insert and
extract, so that i128 and fp128 types can be handled if they make it
past combines. Inserts are unmerged with the inserted element added to
the remerged vector, extracts are unmerged then the correct element is
copied into the destination. With a non-constant vector the usual stack
lowering is used.
2025-08-27 10:21:58 +01:00
David Green
c5105c1e0a
[GlobalISel] Fix bitcast fewerElements with scalar narrow types. (#153364)
For a <8 x i32> -> <2 x i128> bitcast, that under aarch64 is split into
two halfs, the scalar i128 remainder was causing problems, causing a
crash with invalid vector types. This makes sure they are handled
correctly in fewerElementsBitcast.
2025-08-13 22:27:53 +01:00
Fabian Ritter
d64240b5c6
[GISel] Introduce MachineIRBuilder::(build|materialize)ObjectPtrOffset (#150392)
These functions are for building G_PTR_ADDs when we know that the base
pointer and the result are both valid pointers into (or just after) the
same object. They are similar to SelectionDAG::getObjectPtrOffset.

This PR also changes call sites of the generic (build|materialize)PtrAdd
functions that implement pointer arithmetic to split large memory
accesses to the new functions. Since memory accesses have to fit into an
object in memory, pointer arithmetic to an offset into a large memory
access also yields an address in that object.

Currently, these (build|materialize)ObjectPtrOffset functions only add
"nuw" to the generated G_PTR_ADD, but I intend to introduce an
"inbounds" MIFlag in a later PR (analogous to a concurrent effort in
SDAG: #131862, related: #140017, #141725) that will also be set in the
(build|materialize)ObjectPtrOffset functions.

Most test changes just add "nuw" to G_PTR_ADDs. Exceptions are AMDGPU's
call-outgoing-stack-args.ll, flat-scratch.ll, and freeze.ll tests, where
offsets are now folded into scratch instructions, and cases where the
behavior of the check regeneration script changed, resulting, e.g., in
better checks for "nusw G_PTR_ADD" instructions, matched empty lines,
and the use of "CHECK-NEXT" in MIPS tests.

For SWDEV-516125.
2025-07-29 13:04:04 +02:00
paperchalice
ce86ff105b
[GlobalISel] Remove UnsafeFPMath references (#146319)
This is the GlobalISel part to remove `UnsafeFPMath` flag in CodeGen
pipeline.
2025-07-29 12:11:52 +08:00
Pete Chou
314ce691df
[GlobalISel] Allow Legalizer to lower volatile memcpy family. (#145997)
This change updates legalizer to allow lowering volatile memcpy family
as a target might rely on lowering to legalize them.
2025-07-22 00:42:23 -07:00
Fraser Cormack
a516c60ec3
[NFC] Correct typo: invertion -> inversion (#147995) 2025-07-11 07:37:25 +01:00
David Green
3448e9c075
[AArch64][GlobalISel] Fix lowering of i64->f32 itofp. (#132703)
This is a GISel equivalent of #130665, preventing a double-rounding
issue in sitofp/uitofp by scalarizing i64->f32 converts. Most of the
changes are made in the ActionDefinitionsBuilder for G_SITOFP/G_UITOFP.
Because it is legal to convert i64->f16 itofp without double-rounding,
but not a fpround f64->f16, that variant is lowered to build the two
extends.
2025-07-05 18:13:19 +01:00
Pete Chou
13e06403b4
[GlobalISel] Remove dead code. (NFC) (#145811)
LegalizerHelper::lowerMemCpyFamily only execpts G_MEMCPY, G_MEMMOVE, and
G_MMSET.
2025-06-26 10:48:27 +09:00
JaydeepChauhan14
c3c923c8d6
[X86][GlobalISel] Enable SINCOS with libcall mapping (#142438) 2025-06-25 15:37:33 +09:00
Matt Arsenault
a65e0edd6a
PowerPC: Stop reporting memcpy as an alias of memmove on AIX (#143836)
Instead of reporting ___memmove as an implementation of memcpy,
make it unavailable and let the lowering logic consider memmove as
a fallback path.

This avoids a special case 1:N mapping for libcall implementations.
2025-06-23 22:15:37 +09:00
Matt Arsenault
48155f93dd
CodeGen: Emit error if getRegisterByName fails (#145194)
This avoids using report_fatal_error and standardizes the error
message in a subset of the error conditions.
2025-06-23 16:33:35 +09:00
David Green
437346378f
[GlobalISel] Widen vector loads from aligned ptrs (#144309)
If the pointer is aligned to more than the size of the vector, we can
widen the load up to next power of 2 size, as SDAG performs.

Some of the v3 tests are currently worse - those should be addressed in
other issues.
2025-06-21 07:42:54 +01:00
David Green
89f692a24f
[GlobalISel] Split Legalizer debug ouput into paragraphs. NFC (#143427)
This helps keep the legalizer output easier to read, splitting each
instructions legalization into a separate block.
2025-06-15 16:43:18 +08:00
Stanley Gambarin
33974b41c7
[GlobalISel] support lowering of G_SHUFFLEVECTOR with pointer args (#141959) 2025-06-05 09:13:51 -07:00
Matt Arsenault
2e2bbcacf8
AMDGPU/GlobalISel: Start legalizing minimumnum and maximumnum (#140900)
This is the bare minimum to get the intrinsic to compile for AMDGPU,
and it's not optimal. We need to follow along closer with the existing
G_FMINNUM/G_FMAXNUM with custom lowering to handle the IEEE=0 case
better.

Just re-use the existing lowering for the old semantics for
G_FMINNUM/G_FMAXNUM. This does not change G_FMINNUM/G_FMAXNUM's
treatment,
nor try to handle the general expansion without an underlying min/max
variant (or with G_FMINIMUM/G_FMAXIMUM).
2025-05-21 17:00:45 +02:00
jyli0116
382ad6f2e7
[GISel][AArch64] Added more efficient lowering of Bitreverse (#139233)
GlobalISel was previously inefficient in handling bitreverses of vector
types. This deals with i16, i32, i64 vector types and converts them into
i8 bitreverses and rev instructions.
2025-05-13 11:21:50 +01:00
jyli0116
fd80048738
[GlobalISel][AArch64] Handles bitreverse to prevent falling back (#138150)
Handles bitreverse for vector types which were previously falling back
onto Selection DAG. Includes 8-bit element vectors greater than 128 bits
and less than 64 bits: <32 x i8>, <4 x i8>, and odd vector types: <9 x
i8>.
2025-05-06 09:57:01 +01:00
Kazu Hirata
cdc9a4b5f8
[CodeGen] Use range-based for loops (NFC) (#138488)
This is a reland of #138434 except that:

- the bits for llvm/lib/CodeGen/RenameIndependentSubregs.cpp
  have been dropped because they caused a test failure under asan, and

- the bits for llvm/lib/CodeGen/SelectionDAG/ScheduleDAGFast.cpp have
  been improved with structured bindings.
2025-05-05 10:08:49 -07:00
Nico Weber
1d955489c3 Revert "[CodeGen] Use range-based for loops (NFC) (#138434)"
This reverts commit a9699a334bc9666570418a3bed9520bcdc21518b.

Breaks CodeGen/AMDGPU/collapse-endcf.ll in several configs
(sanitizer builds; macOS; possibly more), see comments on
https://github.com/llvm/llvm-project/pull/138434
2025-05-04 17:36:52 -04:00
Kazu Hirata
47f391fd0e
[CodeGen] Remove unused local variables (NFC) (#138441) 2025-05-04 00:26:37 -07:00
Kazu Hirata
a9699a334b
[CodeGen] Use range-based for loops (NFC) (#138434) 2025-05-04 00:26:19 -07:00
Tobias Stadler
0b5daeb2e5
[GlobalISel] Fix miscompile when narrowing vector loads/stores to non-byte-sized types (#136739)
LegalizerHelper::reduceLoadStoreWidth does not work for non-byte-sized
types, because this would require (un)packing of bits across byte
boundaries.

Precommit tests: #134904
2025-04-29 12:36:34 +01:00
Kazu Hirata
47d8fec9b8
[llvm] Use llvm::append_range (NFC) (#136066)
This patch replaces:

  llvm::copy(Src, std::back_inserter(Dst));

with:

  llvm::append_range(Dst, Src);

for breavity.

One side benefit is that llvm::append_range eventually calls
llvm::SmallVector::reserve if Dst is of llvm::SmallVector.
2025-04-16 19:30:01 -07:00
Kazu Hirata
dc5178cc41
[CodeGen] Use llvm::append_range (NFC) (#135567) 2025-04-13 16:36:03 -07:00
Kazu Hirata
e3a3f78f35
[CodeGen] Use llvm::append_range (NFC) (#133603) 2025-03-29 16:53:02 -07:00
Tim Gymnich
1d0005a69a
[GlobalISel][NFC] Rename GISelKnownBits to GISelValueTracking (#133466)
- rename `GISelKnownBits` to `GISelValueTracking` to analyze more than
just `KnownBits` in the future
2025-03-29 11:51:29 +01:00
Kazu Hirata
f3e8e80563
[llvm] Construct SmallVector with ArrayRef (NFC) (#132560) 2025-03-22 13:11:31 -07:00
David Green
53a395fda3
[AArch64][GlobalISel] Legalize more CTPOP vector types. (#131513)
Similar to other operations, s8, s16 s32 and s64 vector elements are
clamped to legal vector sizes, odd number of elements are widened to the
next power-2 and s128 is scalarized.

This helps legalize cttz as well as ctpop.
2025-03-20 07:21:01 +00:00
David Green
b0876994eb
[AArch64][GlobalISel] Clean up CTLZ vector type legalization. (#131514)
Similar to other operations, s8, s16 and s32 vector elements are clamped
to legal vector sizes, but in this case s64 are scalarized to use the
gpr instructions. This allows vector types to split as opposed to
scalarizing.
2025-03-19 19:28:36 +00:00
David Green
bd1be8a242
[CodeGen][GlobalISel] Add a getVectorIdxWidth and getVectorIdxLLT. (#131526)
From #106446, this adds a variant of getVectorIdxTy that returns an LLT.
Many uses only look at the width, so a getVectorIdxWidth was added as
the common base.
2025-03-18 08:31:11 +00:00
Craig Topper
caa798cb1e [GlobalISel] Use Register. NFC 2025-03-02 23:46:18 -08:00
David Green
70ed381b16
[GlobalISel][AArch64] Fix fptoi.sat lowering. (#127901)
The SDAG version uses fminnum/fmaxnum, in converting it to fcmp+select
it appears the order of the operands was chosen badly. This switches the
conditions used to keep the constant on the RHS.
2025-02-20 12:22:11 +00:00
Matt Arsenault
37c341df28 Revert "AMDGPU: Don't canonicalize fminnum/fmaxnum if targets support IEEE fminimum(maximum)_num (#127711)"
This reverts commit 36eaf0daf5d6dd665d7c7a9ec38ea22f27709fed.

This is not a sound approach to dealing with this instruction change.
The new behavior is a different opcode pair, not a modifier on the
existing opcode.
2025-02-20 10:19:14 +07:00
Changpeng Fang
36eaf0daf5
AMDGPU: Don't canonicalize fminnum/fmaxnum if targets support IEEE fminimum(maximum)_num (#127711)
For targets that support IEEE fminimum_num/fmaximum_num, the
corresponding *_min_num_fXY/*_max_num_fXY instructions themselves
already did the canonicalization for the inputs. As a result, we do not
need to explicitly canonicalize the inputs for fminnum/fmaxnum.
2025-02-19 11:16:43 -08:00
JaydeepChauhan14
b693e1cf83
[X86][GlobalISel] Enable G_LROUND/G_LLROUND with libcall mapping (#125096) 2025-02-03 14:12:43 +07:00
David Green
070e129304
[AArch64][GlobalISel] Add disjoint handling for add_and_or_is_add. (#123594)
This allows us to easily detect, without known-bits, that the or in a
fshl/fshr is disjoint allowing us to use usra under aarch64.
2025-02-02 21:01:49 +00:00
David Green
ac7c199a63
[AArch64][GlobalISel] Legalize more G_VECREDUCE_ADD operations. (#123392)
Non-power-2 vectors will now be padded with zero elements, smaller
vectors will be widened using anyext, which I believe will be better in
many situations than padding with zeros, although some small types may
prefer being scalarized depending on the code. Padding with zeros may
not be best for all sizes (v5i8 being the worst), we can hopefully
improve that in the future but they no longer fall back. We scalarize
other types like i128.
2025-01-30 22:17:34 +00:00
Amara Emerson
2d53eaff4a
[AArch64][GlobalISel] Fix legalization for <4 x i1> vector stores.
This case is different from the earlier <8 x i1> case handled because it triggers
a legalization failure in lowerStore() that's intended for scalar code.

It also was triggering incorrect bitcast actions in the AArch64 rules that weren't
expecting truncating stores.

With these two fixed, more cases are handled. The code is still bad, including
some missing load promotion in our combiners that result in dead stores hanging
around at the end of codegen. Again, we can fix these in separate changes.

Reviewers: davemgreen, madhur13490, topperc, arsenm

Reviewed By: davemgreen

Pull Request: https://github.com/llvm/llvm-project/pull/121185
2025-01-06 10:22:48 -08:00
Amara Emerson
6b0807fe2b
[AArch64][GlobalISel] Add support for lowering trunc stores of vector bools.
This is essentially a port of TargetLowering::scalarizeVectorStore(), which
is used for the case where we have something like a store of <8 x s8> truncating
to <8 x s1> in memory. The naive lowering is a sequence of extracts to compute
a scalar value to store.

AArch64's DAG implementation has some more smarts to improve this further which
we can do later.

Reviewers: topperc, davemgreen

Pull Request: https://github.com/llvm/llvm-project/pull/121169
2025-01-06 10:21:42 -08:00
Amara Emerson
41ebbed280
[AArch64][GlobalISel] Legalize vector boolean bitcasts to scalars by lowering via stack.
Reviewers: davemgreen, topperc, arsenm

Reviewed By: arsenm

Pull Request: https://github.com/llvm/llvm-project/pull/121171
2025-01-05 21:32:27 -08:00
Amara Emerson
7e3180a2c2
[AArch64][GlobalISel] Add support for widening vector store elements to s8.
Reviewers: topperc, arsenm, davemgreen

Reviewed By: arsenm

Pull Request: https://github.com/llvm/llvm-project/pull/121170
2025-01-05 21:31:34 -08:00
Craig Topper
54dac27c57
[GISel][RISCV] Use isSExtCheaperThanZExt when widening G_UMAX/G_UMIN. (#120041)
Similar to what we do for unsigned comparisons after #120032.
2024-12-15 23:16:58 -08:00
Craig Topper
115872902b
[GISel][RISCV] Use isSExtCheaperThanZExt when widening G_ICMP. (#120032)
Sign extending i32->i64 is more efficient than zero extend for RV64.
2024-12-15 22:55:58 -08:00