2646 Commits

Author SHA1 Message Date
AZero13
f94290cbff
[ValueTracking][GlobalISel] UCMP and SCMP cannot create undef or poison (#154404)
They cannot make poison or undef, same for IR. They can only make -1, 0,
or 1

Alive 2: https://alive2.llvm.org/ce/z/--Jd78
2025-08-20 08:41:27 +09:00
David Green
03912a1de5
[GlobalISel] Translate scalar sequential vecreduce.fadd/fmul as fadd/fmul. (#153966)
A llvm.vector.reduce.fadd(float, <1 x float>) will be translated to
G_VECREDUCE_SEQ_FADD with two scalar operands, which is illegal
according to the verifier. This makes sure we generate a fadd/fmul
instead.
2025-08-18 14:59:44 +00:00
Kazu Hirata
07eb7b7692
[llvm] Replace SmallSet with SmallPtrSet (NFC) (#154068)
This patch replaces SmallSet<T *, N> with SmallPtrSet<T *, N>.  Note
that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer
element types:

  template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};

We only have 140 instances that rely on this "redirection", with the
vast majority of them under llvm/. Since relying on the redirection
doesn't improve readability, this patch replaces SmallSet with
SmallPtrSet for pointer element types.
2025-08-18 07:01:29 -07:00
Diana Picus
ac005e16f6
Reapply "[AMDGPU] Intrinsic for launching whole wave functions" (#153584)
This reverts commit 14cd1339318b16e08c1363ec6896bd7d1e4ae281. The
buildbot failure seems to have been a cmake issue which has been
discussed in more detail in this Discourse post:

https://discourse.llvm.org/t/cmake-doesnt-regenerate-all-tablegen-target-files/87901

If any buildbots fail to select arbitrary intrinsics with this patch,
it's worth considering using clean builds with ccache instead of
incremental builds, as recommended here:

https://llvm.org/docs/HowToAddABuilder.html#:~:text=Use%20CCache%20and%20NOT%20incremental%20builds

The original commit message for this patch:
Add the llvm.amdgcn.call.whole.wave intrinsic for calling whole wave
functions. This will take as its first argument the callee with the
amdgpu_gfx_whole_wave calling convention, followed by the call
parameters which must match the signature of the callee except for the
first function argument (the i1 original EXEC mask, which doesn't need
to be passed in). Indirect calls are not allowed.

Make direct calls to amdgpu_gfx_whole_wave functions a verifier error.

Tail calls are handled in a future patch.
2025-08-15 10:12:47 +02:00
David Green
c5105c1e0a
[GlobalISel] Fix bitcast fewerElements with scalar narrow types. (#153364)
For a <8 x i32> -> <2 x i128> bitcast, that under aarch64 is split into
two halfs, the scalar i128 remainder was causing problems, causing a
crash with invalid vector types. This makes sure they are handled
correctly in fewerElementsBitcast.
2025-08-13 22:27:53 +01:00
Daniel Paoliello
c430e06fb5
[win][arm64ec] Fix duplicate errors with the dontcall attribute (#152810)
Since the `dontcall-*` attributes are checked both by
`FastISel`/`GlobalISel` and `SelectionDAGBuilder`, and both `FastISel`
and `GlobalISel` bail for calls on Arm64EC for AFTER doing the check, we
ended up emitting duplicate copies of this error.

This change moves the checking for `dontcall-*` in `FastISel` and
`GlobalISel` to after it has been successfully lowered.
2025-08-12 11:05:07 -07:00
Fabian Ritter
96775e9229
[GISel] Handle Flags in G_PTR_ADD Combines (#152495)
So far, GlobalISel's G_PTR_ADD combines have ignored MIFlags like nuw, nusw,
and inbounds. That was in many cases unnecessarily conservative and in others
unsound, since reassociations re-used the existing G_PTR_ADD instructions
without invalidating their flags. This patch aims to improve that.

I've checked the transforms in this PR with Alive2 on corresponding middle-end
IR constructs.

A longer-term goal would be to encapsulate the logic that determines which
GEP/ISD::PTRADD/G_PTR_ADD flags can be preserved in which case, since this
occurs in similar forms in the middle end, the SelectionDAG combines, and the
GlobalISel combines here.

For SWDEV-516125.
2025-08-11 10:34:45 +02:00
Nikita Popov
e92b7e9641
[CodeGen] Provide original IR type to CC lowering (NFC) (#152709)
It is common to have ABI requirements for illegal types: For example,
two i64 argument parts that originally came from an fp128 argument may
have a different call ABI than ones that came from a i128 argument.

The current calling convention lowering does not provide access to this
information, so backends come up with various hacks to support it (like
additional pre-analysis cached in CCState, or bypassing the default
logic entirely).

This PR adds the original IR type to InputArg/OutputArg and passes it
down to CCAssignFn. It is not actually used anywhere yet, this just does
the mechanical changes to thread through the new argument.
2025-08-11 08:57:53 +02:00
Nikita Popov
c23b4fbdbb
[IR] Remove size argument from lifetime intrinsics (#150248)
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.

This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
2025-08-08 11:09:34 +02:00
Nikita Popov
406d9b1dd6
[CodeGen] Move IsFixed into ArgFlags (NFCI) (#152319)
The information whether a specific argument is vararg or fixed is
currently stored separately from all the other argument information in
ArgFlags. This means that it is not accessible from CCAssign, and
backends have developed all kinds of workarounds for how they can access
it after all.

Move this information to ArgFlags to make it directly available in all
relevant places.

I've opted to invert this and store it as IsVarArg, as I think that both
makes the meaning more obvious and provides for a better default (which
is IsVarArg=false).
2025-08-07 09:12:40 +02:00
Diana Picus
14cd133931
Revert "[AMDGPU] Intrinsic for launching whole wave functions" (#152286)
Reverts llvm/llvm-project#145859 because it broke a HIP test:
```
[34/59] Building CXX object External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o
FAILED: External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o 
/home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/llvm/bin/clang++ -DNDEBUG  -O3 -DNDEBUG   -w -Werror=date-time --rocm-path=/opt/botworker/llvm/External/hip/rocm-6.3.0 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -xhip -mfma -MD -MT External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o -MF External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o.d -o External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o -c /home/botworker/bbot/clang-hip-vega20/llvm-test-suite/External/HIP/workload/ray-tracing/TheNextWeek/main.cc
fatal error: error in backend: Cannot select: intrinsic %llvm.amdgcn.readfirstlane
```
2025-08-06 12:24:52 +02:00
Diana Picus
0461cd3d1d
[AMDGPU] Intrinsic for launching whole wave functions (#145859)
Add the llvm.amdgcn.call.whole.wave intrinsic for calling whole wave
functions. This will take as its first argument the callee with the
amdgpu_gfx_whole_wave calling convention, followed by the call
parameters which must match the signature of the callee except for the
first function argument (the i1 original EXEC mask, which doesn't need
to be passed in). Indirect calls are not allowed.

Make direct calls to amdgpu_gfx_whole_wave functions a verifier error.

Unspeakable horrors happen around calls from whole wave functions, the
plan is to improve the handling of caller/callee-saved registers in
a future patch.

Tail calls are also handled in a future patch.
2025-08-06 10:25:53 +02:00
Kazu Hirata
94dc3c6c49
[GlobalISel] Remove an unnecessary cast (NFC) (#152086)
getImm() already returns int64_t.
2025-08-05 07:39:06 -07:00
Fabian Ritter
95191d5460
[GISel] Set more MIFlags when translating GEPs (#151708)
The IRTranslator sets the flags now more consistently with
`SelectionDAGBuilder::visitGetElementPtr()`. This affects `nuw` and `nusw`, as
well as the recently introduced `inbounds` MIFlag (see PR #150900).

This PR also adds more tests to `AArch64/GlobalISel/irtranslator-gep-flags.ll`
to cover all points in `IRTranslator::translateGetElementPtr` that set flags.

For SWDEV-516125.
2025-08-04 13:25:33 +02:00
jyli0116
961a4aabf8
[GlobalISel] Add constant matcher for APInt (#151357)
Changed m_SpecificICst, m_SpecificICstSplat and m_SpecificICstorSplat to
match against APInt as well.
2025-08-04 09:47:21 +01:00
Nikita Popov
86727fe9a1
[IR] Allow poison argument to lifetime markers (#151148)
This slightly relaxes the invariant established in #149310, by also
allowing the lifetime argument to be poison. This is to support the
typical pattern of RAUWing with poison when removing an instruction.

It's worth noting that this does not require any conservative
assumptions, lifetimes with poison arguments can simply be skipped.

Fixes https://github.com/llvm/llvm-project/issues/151119.
2025-08-04 10:02:04 +02:00
Fabian Ritter
ef6eaa045a
[GISel] Introduce MIFlags::InBounds (#150900)
This flag applies to G_PTR_ADD instructions and indicates that the operation
implements an inbounds getelementptr operation, i.e., the pointer operand is in
bounds wrt. the allocated object it is based on, and the arithmetic does not
change that.

It is set when the IRTranslator lowers inbounds GEPs (currently only in some
cases, to be extended with a future PR), and in the
(build|materialize)ObjectPtrOffset functions.

Inbounds information is useful in ISel when we have instructions that perform
address computations whose intermediate steps must be in the same memory region
as the final result. A follow-up patch will start using it for AMDGPU's flat
memory instructions, where the immediate offset must not affect the memory
aperture of the address.

This is analogous to a concurrent effort in SDAG: #131862
(related: #140017, #141725).

For SWDEV-516125.
2025-07-30 13:01:23 +02:00
Fabian Ritter
d64240b5c6
[GISel] Introduce MachineIRBuilder::(build|materialize)ObjectPtrOffset (#150392)
These functions are for building G_PTR_ADDs when we know that the base
pointer and the result are both valid pointers into (or just after) the
same object. They are similar to SelectionDAG::getObjectPtrOffset.

This PR also changes call sites of the generic (build|materialize)PtrAdd
functions that implement pointer arithmetic to split large memory
accesses to the new functions. Since memory accesses have to fit into an
object in memory, pointer arithmetic to an offset into a large memory
access also yields an address in that object.

Currently, these (build|materialize)ObjectPtrOffset functions only add
"nuw" to the generated G_PTR_ADD, but I intend to introduce an
"inbounds" MIFlag in a later PR (analogous to a concurrent effort in
SDAG: #131862, related: #140017, #141725) that will also be set in the
(build|materialize)ObjectPtrOffset functions.

Most test changes just add "nuw" to G_PTR_ADDs. Exceptions are AMDGPU's
call-outgoing-stack-args.ll, flat-scratch.ll, and freeze.ll tests, where
offsets are now folded into scratch instructions, and cases where the
behavior of the check regeneration script changed, resulting, e.g., in
better checks for "nusw G_PTR_ADD" instructions, matched empty lines,
and the use of "CHECK-NEXT" in MIPS tests.

For SWDEV-516125.
2025-07-29 13:04:04 +02:00
paperchalice
ce86ff105b
[GlobalISel] Remove UnsafeFPMath references (#146319)
This is the GlobalISel part to remove `UnsafeFPMath` flag in CodeGen
pipeline.
2025-07-29 12:11:52 +08:00
Nikita Popov
a7a1df8f72
[CodeGen] Remove handling for lifetime.start/end on non-alloca (#149838)
After https://github.com/llvm/llvm-project/pull/149310 we are guaranteed
that the argument is an alloca, so we don't need to look at underlying
objects (which was not a correct thing to do anyway).

This also drops the offset argument for lifetime nodes in SDAG. The
offset is fixed to zero now. (Peculiarly, while SDAG pretended to have
an offset, it just gets silently dropped during selection.)
2025-07-22 09:44:59 +02:00
Pete Chou
314ce691df
[GlobalISel] Allow Legalizer to lower volatile memcpy family. (#145997)
This change updates legalizer to allow lowering volatile memcpy family
as a target might rely on lowering to legalize them.
2025-07-22 00:42:23 -07:00
David Green
3b8adcfd92
[GlobalISel] Add computeNumSignBits for ASHR (#139503) 2025-07-21 10:23:26 +01:00
jyli0116
fc5c5a934d
[GlobalISel] Allow expansion of srem by constant in prelegalizer (#148845)
This patch allows srem by a constant to be expanded more efficiently to
avoid the need for expensive sdiv instructions. This is the last part of
the patches which fixes #118090
2025-07-17 14:43:58 +01:00
jyli0116
806028add1
[GlobaISel] Allow expanding of sdiv -> mul by constant (#146504)
Allows expand of sdiv->mul by constant combine for the general case.
Previously this was only occurring in the exact case. This is part of
the resolution to issue #118090
2025-07-14 15:01:12 +01:00
JaydeepChauhan14
0f0079c29d
[X86][GlobalISel] Added support for llvm.get.rounding (#147716)
- This implementation is adapted from SDAG
X86TargetLowering::LowerGET_ROUNDING.
- llvm.set.rounding will be added later because it involves MXCSR
updates currently unsupported.
2025-07-11 15:48:18 +02:00
Fraser Cormack
a516c60ec3
[NFC] Correct typo: invertion -> inversion (#147995) 2025-07-11 07:37:25 +01:00
Kazu Hirata
16435a87b6
[CodeGen] Remove an unnecessary cast (NFC) (#147155)
Offset is already of int64_t.
2025-07-05 12:26:35 -07:00
David Green
3448e9c075
[AArch64][GlobalISel] Fix lowering of i64->f32 itofp. (#132703)
This is a GISel equivalent of #130665, preventing a double-rounding
issue in sitofp/uitofp by scalarizing i64->f32 converts. Most of the
changes are made in the ActionDefinitionsBuilder for G_SITOFP/G_UITOFP.
Because it is legal to convert i64->f16 itofp without double-rounding,
but not a fpround f64->f16, that variant is lowered to build the two
extends.
2025-07-05 18:13:19 +01:00
jyli0116
9c0743fbc5
[GlobalISel] Allow expansion of urem by constant in prelegalizer (#145914)
This patch allows urem by a constant to be expanded more efficiently to
avoid the need for expensive udiv instructions. This is part of the
resolution to issue #118090
2025-07-02 13:46:36 +01:00
Kazu Hirata
0d0daef6ee
[GlobalISel] Remove an unnecessary cast (NFC) (#146249)
Idx is already of unsigned.
2025-06-28 20:41:17 -07:00
Matt Arsenault
7e2e030121
GlobalISel: Replace use of report_fatal_error (#145866) 2025-06-27 21:16:23 +09:00
Daniel Man
045b827367
[GlobalISel] Use-Vector-Truncate Opt Needs Elt Type Check (#146003)
In the pre-legalizer combiner, there exists a bug with UseVectorTruncate
match-apply optimization. When the destinations' types do not match the
vector element type of the G_UNMERGE_VALUES instruction, the resulting
collapsed truncate does not preserve original functional behavior. This
commit introduces a simple type check to ensure that the destination
types match the vector element type.
2025-06-27 16:41:22 +09:00
Pete Chou
13e06403b4
[GlobalISel] Remove dead code. (NFC) (#145811)
LegalizerHelper::lowerMemCpyFamily only execpts G_MEMCPY, G_MEMMOVE, and
G_MMSET.
2025-06-26 10:48:27 +09:00
JaydeepChauhan14
c3c923c8d6
[X86][GlobalISel] Enable SINCOS with libcall mapping (#142438) 2025-06-25 15:37:33 +09:00
Matt Arsenault
a65e0edd6a
PowerPC: Stop reporting memcpy as an alias of memmove on AIX (#143836)
Instead of reporting ___memmove as an implementation of memcpy,
make it unavailable and let the lowering logic consider memmove as
a fallback path.

This avoids a special case 1:N mapping for libcall implementations.
2025-06-23 22:15:37 +09:00
Nikita Popov
e56384ff54 [IRTranslator] Remove unnecessary isIntrinsic() check (NFC)
Directly call getIntrinsicID(), there is no need to check for
isIntrinsic() first.
2025-06-23 12:43:19 +02:00
Matt Arsenault
48155f93dd
CodeGen: Emit error if getRegisterByName fails (#145194)
This avoids using report_fatal_error and standardizes the error
message in a subset of the error conditions.
2025-06-23 16:33:35 +09:00
David Green
437346378f
[GlobalISel] Widen vector loads from aligned ptrs (#144309)
If the pointer is aligned to more than the size of the vector, we can
widen the load up to next power of 2 size, as SDAG performs.

Some of the v3 tests are currently worse - those should be addressed in
other issues.
2025-06-21 07:42:54 +01:00
Kazu Hirata
dad64877c8
[llvm] Remove an extraneous cast (NFC) (#144955)
llvm::CallBase::getArgOperand returns Value *, so we do not need
const_cast<Value *>.
2025-06-19 16:24:46 -07:00
Kazu Hirata
05cd32adb7
[llvm] Remove unused includes (NFC) (#144293)
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
2025-06-16 08:59:18 -07:00
David Green
89f692a24f
[GlobalISel] Split Legalizer debug ouput into paragraphs. NFC (#143427)
This helps keep the legalizer output easier to read, splitting each
instructions legalization into a separate block.
2025-06-15 16:43:18 +08:00
Stephen Tozer
a08a831515
[DLCov][NFC] Propagate annotated DebugLocs through transformations (#138047)
Part of the coverage-tracking feature, following #107279.

In order for DebugLoc coverage testing to work, we firstly have to set
annotations for intentionally-empty DebugLocs, and secondly we have to
ensure that we do not drop these annotations as we propagate DebugLocs
throughout compilation. As the annotations exist as part of the DebugLoc
class, and not the underlying DILocation, they will not survive a
DebugLoc->DILocation->DebugLoc roundtrip. Therefore this patch modifies
a number of places in the compiler to propagate DebugLocs directly
rather than via the underlying DILocation. This has no effect on the
output of normal builds; it only ensures that during coverage builds, we
do not drop incorrectly annotations and therefore create false
positives.

The bulk of these changes are in replacing
DILocation::getMergedLocation(s) with a DebugLoc equivalent, and in
changing the IRBuilder to store a DebugLoc directly rather than storing
DILocations in its general Metadata array. We also use a new function,
`DebugLoc::orElse`, which selects the "best" DebugLoc out of a pair
(valid location > annotated > empty), preferring the current DebugLoc on
a tie - this encapsulates the existing behaviour at a few sites where we
_may_ assign a DebugLoc to an existing instruction, while extending the
logic to handle annotation DebugLocs at the same time.
2025-06-12 14:06:27 +01:00
Tim Gymnich
5aed4800f3
[GISel] KnownFPClass ValueTracking fix handling of vectors (#143372) 2025-06-12 14:43:40 +02:00
Jason Eckhardt
bf53a49492
[GISel][NFC] Use ranged-for/enumerate in a few places. (#143185)
Follow-up to https://github.com/llvm/llvm-project/pull/143113.
2025-06-07 08:49:53 -05:00
Usha Gupta
d5a1f49827
[GISel] [NFC] Capitalize loop indices in GISelValueTracking.cpp for style consistency (#143113)
Following up on a comment on
https://github.com/llvm/llvm-project/pull/142355.
Updated other instances in the file as well.

@jayfoad
2025-06-06 23:14:50 +09:00
Guy David
4d4b7cc69e
[AArch64] Skip storing of stack arguments when lowering tail calls (#126735)
This issue starts in the selection DAG and causes the backend to emit
the following for a trivial tail call:
```
ldr w8, [sp]
str w8, [sp]
b func
```

I'm not too sure that checking for immutability of a specific stack
object is a good enough of a gurantee, because as soon a tail-call is
done lowering,`setHasTailCall()` is called and in that case perhaps a
pass is allowed to change the value of the object in-memory?

This can be extended to the ARM backend as well.
Removed the `tailcall` keyword from a few other test assets, I'm
assuming their original intent was left intact.
2025-06-06 11:26:24 +03:00
Stanley Gambarin
33974b41c7
[GlobalISel] support lowering of G_SHUFFLEVECTOR with pointer args (#141959) 2025-06-05 09:13:51 -07:00
Usha Gupta
cf348e886d
[GlobalISel] Add G_CONCAT_VECTOR handling in computeNumSignBits (#142355)
Code ported from SelectionDAG::ComputeNumSignBits
2025-06-04 11:11:18 +01:00
Usha Gupta
7c996012ce
[GlobalISel] Add G_CONCAT_VECTOR computeKnownBits (#141933)
Code ported from SelectionDAG::computeKnownBits.
2025-05-30 10:44:59 +01:00
Kazu Hirata
3bc174ba77
[CodeGen] Remove unused includes (NFC) (#141320)
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
2025-05-24 00:00:00 -07:00