14368 Commits

Author SHA1 Message Date
Nikita Popov
e92b7e9641
[CodeGen] Provide original IR type to CC lowering (NFC) (#152709)
It is common to have ABI requirements for illegal types: For example,
two i64 argument parts that originally came from an fp128 argument may
have a different call ABI than ones that came from a i128 argument.

The current calling convention lowering does not provide access to this
information, so backends come up with various hacks to support it (like
additional pre-analysis cached in CCState, or bypassing the default
logic entirely).

This PR adds the original IR type to InputArg/OutputArg and passes it
down to CCAssignFn. It is not actually used anywhere yet, this just does
the mechanical changes to thread through the new argument.
2025-08-11 08:57:53 +02:00
Yingwei Zheng
62735d26b1
[DAGCombine] Correctly extend the constant RHS in TargetLowering::SimplifySetCC (#152862)
In https://github.com/llvm/llvm-project/pull/150270, when the predicate
is eq/ne and the trunc has only an nsw flag, the RHS is incorrectly
zero-extended.

Closes https://github.com/llvm/llvm-project/issues/152630.
2025-08-10 01:24:37 +08:00
Alexander Richardson
3a4b351ba1
[IR] Introduce the ptrtoaddr instruction
This introduces a new `ptrtoaddr` instruction which is similar to
`ptrtoint` but has two differences:

1) Unlike `ptrtoint`, `ptrtoaddr` does not capture provenance
2) `ptrtoaddr` only extracts (and then extends/truncates) the low
   index-width bits of the pointer

For most architectures, difference 2) does not matter since index (address)
width and pointer representation width are the same, but this does make a
difference for architectures that have pointers that aren't just plain
integer addresses such as AMDGPU fat pointers or CHERI capabilities.

This commit introduces textual and bitcode IR support as well as basic code
generation, but optimization passes do not handle the new instruction yet
so it may result in worse code than using ptrtoint. Follow-up changes will
update capture tracking, etc. for the new instruction.

RFC: https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54

Reviewed By: nikic

Pull Request: https://github.com/llvm/llvm-project/pull/139357
2025-08-08 10:12:39 -07:00
woruyu
95b16d1264
[DAG] Fold trunc(abdu(x,y)) and trunc(abds(x,y)) if they have sufficient leading zero/sign bits (#151471)
This PR resolves https://github.com/llvm/llvm-project/issues/147683

---------

Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-08-08 10:43:14 +01:00
Nikita Popov
c23b4fbdbb
[IR] Remove size argument from lifetime intrinsics (#150248)
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.

This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
2025-08-08 11:09:34 +02:00
Benjamin Maxwell
94c48a21bb
[AArch64][SVE] Fix hang in VECTOR_HISTOGRAM DAG combine (#152539)
The histogram DAG combine went into an infinite loop of creating the
same histogram node due to an incorrect use of the `refineUniformBase`
and `refineIndexType` APIs.

These APIs take SDValues by reference (SDValue&) and return `true` if
they were "refined" (i.e., set to new values).

Previously, this DAG combine would create the `Ops` array (used to
create the new histogram node) before calling the `refine*` APIs, which
copies the SDValues into the array, meaning the updated values were not
used to create the new histogram node.

Reproducer: https://godbolt.org/z/hsGWhTaqY (it will timeout)
2025-08-08 09:59:24 +01:00
David Stuttard
c7c0229480
Revert "[AMDGPU] SelectionDAG divergence tracking should take into account Target divergency. (#147560)" (#152548)
This reverts commit 9293b65a616b8de432a654d046e802540b146372.
2025-08-08 09:05:59 +01:00
zhijian lin
093439c688
[PowerPC][AIX] Using milicode for memcmp instead of libcall (#147093)
AIX has "millicode" routines, which are functions loaded at boot time
into fixed addresses in kernel memory. This allows them to be customized
for the processor. The __memcmp routine is a millicode implementation;
we use millicode for the memcmp function instead of a library call to
improve performance.
2025-08-07 13:13:56 -04:00
Chaitanya Koparkar
6ce68d3a12
[DAG] canCreateUndefOrPoison - add FP_EXTEND (#152249)
Fixes https://github.com/llvm/llvm-project/issues/152141
2025-08-07 09:23:46 +01:00
Nikita Popov
406d9b1dd6
[CodeGen] Move IsFixed into ArgFlags (NFCI) (#152319)
The information whether a specific argument is vararg or fixed is
currently stored separately from all the other argument information in
ArgFlags. This means that it is not accessible from CCAssign, and
backends have developed all kinds of workarounds for how they can access
it after all.

Move this information to ArgFlags to make it directly available in all
relevant places.

I've opted to invert this and store it as IsVarArg, as I think that both
makes the meaning more obvious and provides for a better default (which
is IsVarArg=false).
2025-08-07 09:12:40 +02:00
Craig Topper
57045a137f
[DAGCombiner] Avoid repeated calls to WideVT.getScalarSizeInBits() in DAGCombiner::mergeTruncStores. NFC (#152231)
We already have a variable, WideNumBits, that contains the same
information. Use it and delay the creation of WideVT until we really
need it.
2025-08-06 09:10:02 -07:00
Simon Pilgrim
c4f6d34674
[DAG] getNode - fold (sext (trunc x)) -> x iff the upper bits are already signbits (#151945)
Similar to what we already do for ZERO_EXTEND/ANY_EXTEND patterns.
2025-08-06 14:55:46 +01:00
Diana Picus
14cd133931
Revert "[AMDGPU] Intrinsic for launching whole wave functions" (#152286)
Reverts llvm/llvm-project#145859 because it broke a HIP test:
```
[34/59] Building CXX object External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o
FAILED: External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o 
/home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/llvm/bin/clang++ -DNDEBUG  -O3 -DNDEBUG   -w -Werror=date-time --rocm-path=/opt/botworker/llvm/External/hip/rocm-6.3.0 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -xhip -mfma -MD -MT External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o -MF External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o.d -o External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o -c /home/botworker/bbot/clang-hip-vega20/llvm-test-suite/External/HIP/workload/ray-tracing/TheNextWeek/main.cc
fatal error: error in backend: Cannot select: intrinsic %llvm.amdgcn.readfirstlane
```
2025-08-06 12:24:52 +02:00
Diana Picus
0461cd3d1d
[AMDGPU] Intrinsic for launching whole wave functions (#145859)
Add the llvm.amdgcn.call.whole.wave intrinsic for calling whole wave
functions. This will take as its first argument the callee with the
amdgpu_gfx_whole_wave calling convention, followed by the call
parameters which must match the signature of the callee except for the
first function argument (the i1 original EXEC mask, which doesn't need
to be passed in). Indirect calls are not allowed.

Make direct calls to amdgpu_gfx_whole_wave functions a verifier error.

Unspeakable horrors happen around calls from whole wave functions, the
plan is to improve the handling of caller/callee-saved registers in
a future patch.

Tail calls are also handled in a future patch.
2025-08-06 10:25:53 +02:00
Alex MacLean
d27802a217
[DAGCombiner] Fold setcc of trunc, generalizing some NVPTX isel logic (#150270)
That change adds support for folding a SETCC when one or both of the
operands is a TRUNCATE with the appropriate no-wrap flags. This pattern
can occur when promoting i8 operations in NVPTX, and we currently have
some ISel rules to try to handle it.
2025-08-05 19:20:17 -07:00
Craig Topper
73685583c8
[VP][RISCV] Add a vp.load.ff intrinsic for fault only first load. (#128593)
There's been some interest in supporting early-exit loops recently.
https://discourse.llvm.org/t/rfc-supporting-more-early-exit-loops/84690

This patch was extracted from our downstream where we've been using it
in our vectorizer.
2025-08-05 16:12:42 -07:00
Simon Pilgrim
9f50224b25
[DAG] Remove Depth=1 hack from isGuaranteedNotToBeUndefOrPoison checks (#152127)
Now that #146490 removed the assertion in visitFreeze to assert that the
node was still isGuaranteedNotToBeUndefOrPoison we no longer need this
reduced depth hack (which had to account for the difference in depth of
freeze(op()) vs op(freeze())

Helps with some of the minor regressions in #150017
2025-08-05 13:35:04 +01:00
Simon Pilgrim
d561259a08
[DAG] visitFREEZE - replace multiple frozen/unfrozen uses of an SDValue with just the frozen node (#150017)
Similar to InstCombinerImpl::freezeOtherUses, attempt to ensure that we
merge multiple frozen/unfrozen uses of a SDValue. This fixes a number of
hasOneUse() problems when trying to push FREEZE nodes through the DAG.

Remove SimplifyMultipleUseDemandedBits handling of FREEZE nodes as we
now want to keep the common node, and not bypass for some nodes just
because of DemandedElts.

Fixes #149799
2025-08-05 09:24:09 +01:00
Craig Topper
a3a8e1c064
[TargetLowering][RISCV] Use sra for (X & -256) == 256 -> (X >> 8) == 1 if it yields a better icmp constant. (#151762)
If using srl does not produce a legal constant for the RHS of the
final compare, try to use sra instead.
    
Because the AND constant is negative, the sign bits participate in the
compare. Using an arithmetic shift right duplicates that bit.
2025-08-04 09:00:41 -07:00
woruyu
38bfe9ae56
[DAG] combineVSelectWithAllOnesOrZeros - missing freeze (#150388)
This PR resolves https://github.com/llvm/llvm-project/issues/150069

---------

Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-08-04 15:55:12 +01:00
Simon Pilgrim
5c2054a4ea
[DAG] getMinMaxOpcodeForFP - split if-else chain. NFC. (#151938)
(style) All cases return so split the chain
2025-08-04 15:32:08 +01:00
Abhishek Kaushik
1c0ac80d4a
[DAG] Combine store + vselect to masked_store (#145176)
Add a new combine to replace
```
(store ch (vselect cond truevec (load ch ptr offset)) ptr offset)
```
to
```
(mstore ch truevec ptr offset cond)
```

This saves a blend operation on targets that support conditional stores.
2025-08-04 19:05:36 +05:30
Nikita Popov
86727fe9a1
[IR] Allow poison argument to lifetime markers (#151148)
This slightly relaxes the invariant established in #149310, by also
allowing the lifetime argument to be poison. This is to support the
typical pattern of RAUWing with poison when removing an instruction.

It's worth noting that this does not require any conservative
assumptions, lifetimes with poison arguments can simply be skipped.

Fixes https://github.com/llvm/llvm-project/issues/151119.
2025-08-04 10:02:04 +02:00
Min-Yih Hsu
7ebbbd885f
[DAG] Always use stack to promote bitcast when the source is vector (#151065)
The optimization introduced by #125637 tried to avoid using stacks to
promote bitcast with vector result type. However, it wouldn't be correct
if the input type is vector. This patch limits that optimizations to
only scalar to vector bitcasts.
2025-08-02 15:32:10 -07:00
Craig Topper
f952a84f2f [TargetLowering] Use getShiftAmountConstant in buildSDIVPow2WithCMov. 2025-08-02 10:50:46 -07:00
AZero13
23022a4683
[SelectionDAG] Move sign pattern check from AArch64 and ARM to general SelectionDAG (#151736)
This works on all cases much like the XOR case above it in SelectionDAG.
2025-08-01 14:46:51 -07:00
Paul Walker
ceb2b9c141
[LLVM][DAGCombiner] fold (shl (X * vscale(C0)), C1) -> (X * vscale(C0 << C1)). (#150651) 2025-08-01 11:42:45 +01:00
黃國庭
f04ea2ef1c
Add m_SelectCCLike matcher to match SELECT_CC or SELECT with SETCC (#149646)
Fix #147282 and  Follow-up to #148834

---------

Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-08-01 10:12:05 +01:00
David Sherwood
05b16aff0f
[DAGCombiner] Add combine for vector interleave of splats (#151110)
This patch adds two DAG combines:

1. vector_interleave(splat, splat, ...) -> {splat,splat,...}
2. concat_vectors(splat, splat, ...) -> wide_splat

where all the input splats are identical. Both of these
together enable us to fold
  concat_vectors(vector_interleave(splat, splat, ...))
into a wide splat. Post-legalisation we must only do the
concat_vector combine if the wider type and splat operation
is legal.

For fixed-width vectors the DAG combine only occurs for
interleave factors of 3 or more, however it's not currently
safe to test this for AArch64 since there isn't any lowering
support for fixed-width interleaves. I've only added
fixed-width tests for RISCV.
2025-08-01 09:58:05 +01:00
Craig Topper
2737d013a0
[SelectionDAG] Improve the doxygen description for SDValue::isOperandOf. NFC (#151244)
SDValue::isOperandOf checks the result number in addition to the SDNode*.
SDNode::isOperandOf only checks the SDNode*.
2025-07-31 12:58:27 -07:00
Prabhu Rajasekaran
17ccb849f3
[llvm] Extract and propagate callee_type metadata
Update MachineFunction::CallSiteInfo to extract numeric CalleeTypeIds
from callee_type metadata attached to indirect call instructions.

Reviewers: nikic, ilovepi

Reviewed By: ilovepi

Pull Request: https://github.com/llvm/llvm-project/pull/87575
2025-07-30 14:56:39 -07:00
Paul Walker
13f38c97d5
[LLVM][SelectionDAG] Align poison/undef binop folds with IR. (#149334)
The "at construction" binop folds in SelectionDAG::getNode() has
different behaviour when compared to the equivalent LLVM IR. This PR
makes the behaviour consistent while also extending the coverage to
include signed/unsigned max/min operations.
2025-07-30 11:20:30 +01:00
Pierre van Houtryve
c4b1557097
[DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences (#146054)
Fold sequences where we extract a bunch of contiguous bits from a value,
merge them into the low bit and then check if the low bits are zero or
not.

Usually the and would be on the outside (the leaves) of the expression,
but the DAG canonicalizes it to a single `and` at the root of the
expression.

The reason I put this in DAGCombiner instead of the target combiner is
because this is a generic, valid transform that's also fairly niche, so
there isn't much risk of a combine loop I think.

See #136727
2025-07-30 10:27:19 +02:00
Craig Topper
eddd34227e [TargetLowering] Use getShiftAmountConstant in CTTZTableLookup. NFC 2025-07-29 22:43:42 -07:00
Pierre van Houtryve
250f2a6367
[DAG] Remove AssertZext if the input is masked (#146052)
Remove AssertZext if the input ensures the assert cannot fail.
2025-07-29 13:05:30 +02:00
Nikita Popov
ab1f6ce482
[IR][SDAG] Remove lifetime size handling from SDAG (#150944)
Split out from https://github.com/llvm/llvm-project/pull/150248:

Specify that the argument of lifetime.start/lifetime.end is ignored and
will be removed in the future.

Remove lifetime size handling from SDAG. The size was previously
discarded during isel, so was always ignored for stack coloring anyway.
Where necessary, obtain the size of the full frame index.
2025-07-29 09:53:59 +02:00
paperchalice
21836f4a49
[SelectionDAG] Remove UnsafeFPMath in LegalizeDAG (#146316)
These global flags hinder further improvements like [[RFC] Honor pragmas
with
-ffp-contract=fast](https://discourse.llvm.org/t/rfc-honor-pragmas-with-ffp-contract-fast)
and pass concurrency support. Remove them incrementally.
2025-07-29 08:41:21 +08:00
Matt Arsenault
1461a1c3b8
DAG: Emit an error if trying to legalize read/write register with illegal types (#145197)
This is a starting point to have better legalization failure diagnostics
2025-07-26 10:54:59 +09:00
Hood Chatham
15715f4089
[WebAssembly,llvm] Add llvm.wasm.ref.test.func intrinsic (#147486)
This adds an llvm intrinsic for WebAssembly to test the type of a
function. It is intended for adding a future clang builtin
` __builtin_wasm_test_function_pointer_signature` so we can test whether
calling a function pointer will fail with function signature mismatch.

Since the type of a function pointer is just `ptr` we can't figure out
the expected type from that.
The way I figured out to encode the type was by passing 0's of the
appropriate type to the intrinsic.
The first argument gives the expected type of the return type and the
later values give the expected
type of the arguments. So
```llvm
@llvm.wasm.ref.test.func(ptr %func, float 0.000000e+00, double 0.000000e+00, i32 0)
```
tests if `%func` is of type `(double, i32) -> (i32)`. It will lower to:
```wat
local.get $func
table.get $__indirect_function_table
ref.test (double, i32) -> (i32)
```
To indicate the function should be void, I somewhat arbitrarily picked
`token poison`, so the following tests for `(i32) -> ()`:
```llvm
@llvm.wasm.ref.test.func(ptr %func, token poison, i32 0)
```

To lower this intrinsic, we need some place to put the type information.
With `encodeFunctionSignature()` we encode the signature information
into an `APInt`. We decode it in `lowerEncodedFunctionSignature` in
`WebAssemblyMCInstLower.cpp`.
2025-07-22 14:07:34 -07:00
Craig Topper
7cb256bcaa
[SelectionDAG] Remove FIXME and commented out code from 20 years ago. NFC (#150055) 2025-07-22 11:17:50 -07:00
Simon Pilgrim
c710d460a5
[DAG] expandVECTOR_COMPRESS - remove superfluous getFreeze. NFC. (#150062)
freeze(freeze(extract_vector_elt(x,i))) -> freeze(extract_vector_elt(x,i))
2025-07-22 18:37:12 +01:00
Craig Topper
75ec7250aa
[SelectionDAG] Use SDUse::get() instead of a static_cast to SDValue. NFC (#150043) 2025-07-22 09:28:02 -07:00
Craig Topper
8d549cf036
[SelectionDAG] Pass SDNodeFlags through getNode instead of setFlags. (#149852)
getNode updates flags correctly for CSE. Calling setFlags after getNode
may set the flags where they don't apply.

I've added a Flags argument to getSelectCC and the signature of getNode that takes
an ArrayRef of EVTs.
2025-07-22 08:06:30 -07:00
Simon Pilgrim
c37942df00
[DAG] visitFREEZE - limit freezing of multiple operands (#149797)
This is a partial revert of #145939 (I've kept the BUILD_VECTOR(FREEZE(UNDEF), FREEZE(UNDEF), elt2, ...) canonicalization) as we're getting reports of infinite loops (#148084).

The issue appears to be due to deep chains of nodes and how visitFREEZE replaces all instances of an operand with a common frozen version - other users of the original frozen node then get added back to the worklist but might no longer be able to confirm a node isn't poison due to recursion depth limits on isGuaranteedNotToBeUndefOrPoison.

The issue still exists with the old implementation but by only allowing a single frozen operand it helps prevent cases of interdependent frozen nodes.

I'm still working on supporting multiple operands as its critical for topological DAG handling but need to get a fix in for trunk and 21.x.

Fixes #148084
2025-07-22 15:40:55 +01:00
Simon Pilgrim
4b0625f051 [DAG] isNonZeroModBitWidthOrUndef - fix bugprone-argument-comment analyzer warning. NFC.
matchUnaryPredicate argument is AllowUndefs not AllowUndef
2025-07-22 10:36:59 +01:00
Nikita Popov
a7a1df8f72
[CodeGen] Remove handling for lifetime.start/end on non-alloca (#149838)
After https://github.com/llvm/llvm-project/pull/149310 we are guaranteed
that the argument is an alloca, so we don't need to look at underlying
objects (which was not a correct thing to do anyway).

This also drops the offset argument for lifetime nodes in SDAG. The
offset is fixed to zero now. (Peculiarly, while SDAG pretended to have
an offset, it just gets silently dropped during selection.)
2025-07-22 09:44:59 +02:00
Craig Topper
423cea7607 [SelectionDAG] Fix incorrect indentation. NFC 2025-07-21 13:06:21 -07:00
Simon Pilgrim
17c7c2ebe8
[DAG] Add missing Depth argument to isGuaranteedNotToBeUndefOrPoison calls inside SimplifyDemanded methods (#149550)
Ensure we don't exceed the maximum recursion depth
2025-07-20 13:06:55 +01:00
Simon Pilgrim
92e2d4e9e1
[DAG] visitFREEZE - remove unused HadMaybePoisonOperands check. NFC. (#149517)
Redundant since #145939
2025-07-18 17:38:11 +01:00
Annu Singh
148fd6ed0a
[DAG] Adding abdu/abds to canCreateUndefOrPoison (#149017)
Fixes #147695 
- [Alive2 test - freeze abdu](https://alive2.llvm.org/ce/z/aafeJs)
- [Alive 2 test - freeze abds](https://alive2.llvm.org/ce/z/XrSmP4)

---------

Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-07-18 17:00:44 +01:00