38183 Commits

Author SHA1 Message Date
Simon Pilgrim
c4f6d34674
[DAG] getNode - fold (sext (trunc x)) -> x iff the upper bits are already signbits (#151945)
Similar to what we already do for ZERO_EXTEND/ANY_EXTEND patterns.
2025-08-06 14:55:46 +01:00
Diana Picus
14cd133931
Revert "[AMDGPU] Intrinsic for launching whole wave functions" (#152286)
Reverts llvm/llvm-project#145859 because it broke a HIP test:
```
[34/59] Building CXX object External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o
FAILED: External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o 
/home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/llvm/bin/clang++ -DNDEBUG  -O3 -DNDEBUG   -w -Werror=date-time --rocm-path=/opt/botworker/llvm/External/hip/rocm-6.3.0 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -xhip -mfma -MD -MT External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o -MF External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o.d -o External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o -c /home/botworker/bbot/clang-hip-vega20/llvm-test-suite/External/HIP/workload/ray-tracing/TheNextWeek/main.cc
fatal error: error in backend: Cannot select: intrinsic %llvm.amdgcn.readfirstlane
```
2025-08-06 12:24:52 +02:00
Diana Picus
0461cd3d1d
[AMDGPU] Intrinsic for launching whole wave functions (#145859)
Add the llvm.amdgcn.call.whole.wave intrinsic for calling whole wave
functions. This will take as its first argument the callee with the
amdgpu_gfx_whole_wave calling convention, followed by the call
parameters which must match the signature of the callee except for the
first function argument (the i1 original EXEC mask, which doesn't need
to be passed in). Indirect calls are not allowed.

Make direct calls to amdgpu_gfx_whole_wave functions a verifier error.

Unspeakable horrors happen around calls from whole wave functions, the
plan is to improve the handling of caller/callee-saved registers in
a future patch.

Tail calls are also handled in a future patch.
2025-08-06 10:25:53 +02:00
Alex MacLean
d27802a217
[DAGCombiner] Fold setcc of trunc, generalizing some NVPTX isel logic (#150270)
That change adds support for folding a SETCC when one or both of the
operands is a TRUNCATE with the appropriate no-wrap flags. This pattern
can occur when promoting i8 operations in NVPTX, and we currently have
some ISel rules to try to handle it.
2025-08-05 19:20:17 -07:00
Craig Topper
73685583c8
[VP][RISCV] Add a vp.load.ff intrinsic for fault only first load. (#128593)
There's been some interest in supporting early-exit loops recently.
https://discourse.llvm.org/t/rfc-supporting-more-early-exit-loops/84690

This patch was extracted from our downstream where we've been using it
in our vectorizer.
2025-08-05 16:12:42 -07:00
Jann
da6424c9e3
[DebugInfo][DWARF] Don't emit bogus DW_AT_call_target for complex calls (#151378)
On X86-64, LLVM currently generates the same DWARF debug info for `call
rax` and `call [rax]`; in both cases, the generated DWARF claims that
the call goes to address RAX. This bug occurs because the X86 machine
instructions CALL64r and CALL64m both receive register operands, but
those register operands have different semantics.

To fix it, change DwarfDebug::constructCallSiteEntryDIEs() to validate
the callee operand's semantics (`OperandType`) and make sure it is not
semantically describing a memory location.

This fix will result in less DW_TAG_call_site and DW_AT_call_target
entries being generated.

There is an existing test in dwarf-callsite-related-attrs.ll that
asserts the broken behavior; remove the broken check, and instead add a
new test dwarf-callsite-related-attrs-indirect.ll that checks behavior
for indirect calls.

The existing test xray-custom-log.ll is validating something even more
broken: It checks the debug info generated by a PATCHABLE_EVENT_CALL.
`TII->getCalleeOperand()` assumes that the first argument of a call
instruction is always the destination, but the first argument of
PATCHABLE_EVENT_CALL is instead the event structure; and so we were
emitting debug info claiming the callee was stored in a register that
actually contains some kind of xray event descriptor, and the test
validates that this happens.
I am breaking and deleting this test.
I guess the intent there might have been to validate that we emit
debuginfo referencing the target of the direct call that LLVM emits
(which we don't do)? But I'm not sure.
2025-08-05 13:25:01 -07:00
Kazu Hirata
94dc3c6c49
[GlobalISel] Remove an unnecessary cast (NFC) (#152086)
getImm() already returns int64_t.
2025-08-05 07:39:06 -07:00
Kazu Hirata
86ab5dc583
[AsmPrinter] Remove an unnecessary cast (NFC) (#152085)
getValue() already returns uint64_t.
2025-08-05 07:38:58 -07:00
KRM7
ee47427386
[RegisterCoalescer] Fix subrange update when rematerialization widens a def (#151974)
Currently, when an instruction rematerialized by the register coalescer
defines more subregs of the destination register
than the original COPY instruction did, we only add dead defs for the
newly defined subregs if they were not defined anywhere
else. For example, consider something like this before
rematerialization:
```
 %0:reg64 = CONSTANT 1
 %1:reg128.sub_lo64_lo32 = COPY %0.lo32
 %1:reg128.sub_lo64_hi32 = ...
 ...
```
that would look like this after rematerializing `%0`:
```
 %0:reg64 = CONSTANT 2
 %1:reg128.sub_lo64 = CONSTANT 2
 %1:reg128.sub_lo64_hi32 = ...
 ...
```
A dead def would not be added for `%1.sub_lo64_hi32` at the 2nd
instruction because it's subrange wasn't empty beforehand.
2025-08-05 22:32:31 +09:00
Simon Pilgrim
9f50224b25
[DAG] Remove Depth=1 hack from isGuaranteedNotToBeUndefOrPoison checks (#152127)
Now that #146490 removed the assertion in visitFreeze to assert that the
node was still isGuaranteedNotToBeUndefOrPoison we no longer need this
reduced depth hack (which had to account for the difference in depth of
freeze(op()) vs op(freeze())

Helps with some of the minor regressions in #150017
2025-08-05 13:35:04 +01:00
Paul Walker
94d374ab6c
[LLVM][CGP] Allow finer control for sinking compares. (#151366)
Compare sinking is selectable based on the result of
hasMultipleConditionRegisters. This function is too coarse grained by
not taking into account the differences between scalar and vector
compares. This PR extends the interface to take an EVT to allow finer
control.
    
The new interface is used by AArch64 to disable sinking of scalable
vector compares, but with isProfitableToSinkOperands updated to maintain
the cases that are specifically tested.
2025-08-05 11:43:41 +01:00
Simon Pilgrim
d561259a08
[DAG] visitFREEZE - replace multiple frozen/unfrozen uses of an SDValue with just the frozen node (#150017)
Similar to InstCombinerImpl::freezeOtherUses, attempt to ensure that we
merge multiple frozen/unfrozen uses of a SDValue. This fixes a number of
hasOneUse() problems when trying to push FREEZE nodes through the DAG.

Remove SimplifyMultipleUseDemandedBits handling of FREEZE nodes as we
now want to keep the common node, and not bypass for some nodes just
because of DemandedElts.

Fixes #149799
2025-08-05 09:24:09 +01:00
Craig Topper
a3a8e1c064
[TargetLowering][RISCV] Use sra for (X & -256) == 256 -> (X >> 8) == 1 if it yields a better icmp constant. (#151762)
If using srl does not produce a legal constant for the RHS of the
final compare, try to use sra instead.
    
Because the AND constant is negative, the sign bits participate in the
compare. Using an arithmetic shift right duplicates that bit.
2025-08-04 09:00:41 -07:00
woruyu
38bfe9ae56
[DAG] combineVSelectWithAllOnesOrZeros - missing freeze (#150388)
This PR resolves https://github.com/llvm/llvm-project/issues/150069

---------

Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-08-04 15:55:12 +01:00
Kazu Hirata
76abef60b0
[CodeGen] Remove an unnecessary cast (NFC) (#151901)
getOpcode() already returns unsigned.
2025-08-04 07:40:10 -07:00
Simon Pilgrim
6de745b547
Revert "[X86] Correct 32-bit immediate assertion and fix 64-bit lowering for huge frame offsets" (#151975)
Reverts llvm/llvm-project#123872 - this is breaking on EXPENSIVE_CHECKS builds

Co-authored-by: Abhishek Kaushik <abhishek.kaushik@intel.com>
2025-08-04 15:32:28 +01:00
Simon Pilgrim
5c2054a4ea
[DAG] getMinMaxOpcodeForFP - split if-else chain. NFC. (#151938)
(style) All cases return so split the chain
2025-08-04 15:32:08 +01:00
Abhishek Kaushik
1c0ac80d4a
[DAG] Combine store + vselect to masked_store (#145176)
Add a new combine to replace
```
(store ch (vselect cond truevec (load ch ptr offset)) ptr offset)
```
to
```
(mstore ch truevec ptr offset cond)
```

This saves a blend operation on targets that support conditional stores.
2025-08-04 19:05:36 +05:30
Sander de Smalen
ed5bd23867 Revert "Reland "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG" (#134408)"
This reverts commit bae8f1336db6a7f3288a7dcf253f2d484743b257.

Some issues were found:
* https://github.com/llvm/llvm-project/issues/151768
* https://github.com/llvm/llvm-project/issues/151592
* https://github.com/llvm/llvm-project/pull/134408#issuecomment-3145468321
* https://github.com/llvm/llvm-project/issues/151888#issuecomment-3149286820

I'll revert this for the time being while I investigate.
2025-08-04 12:07:30 +00:00
Fabian Ritter
95191d5460
[GISel] Set more MIFlags when translating GEPs (#151708)
The IRTranslator sets the flags now more consistently with
`SelectionDAGBuilder::visitGetElementPtr()`. This affects `nuw` and `nusw`, as
well as the recently introduced `inbounds` MIFlag (see PR #150900).

This PR also adds more tests to `AArch64/GlobalISel/irtranslator-gep-flags.ll`
to cover all points in `IRTranslator::translateGetElementPtr` that set flags.

For SWDEV-516125.
2025-08-04 13:25:33 +02:00
jyli0116
961a4aabf8
[GlobalISel] Add constant matcher for APInt (#151357)
Changed m_SpecificICst, m_SpecificICstSplat and m_SpecificICstorSplat to
match against APInt as well.
2025-08-04 09:47:21 +01:00
Nikita Popov
86727fe9a1
[IR] Allow poison argument to lifetime markers (#151148)
This slightly relaxes the invariant established in #149310, by also
allowing the lifetime argument to be poison. This is to support the
typical pattern of RAUWing with poison when removing an instruction.

It's worth noting that this does not require any conservative
assumptions, lifetimes with poison arguments can simply be skipped.

Fixes https://github.com/llvm/llvm-project/issues/151119.
2025-08-04 10:02:04 +02:00
Fangrui Song
d6c2e53151 MCSymbolXCOFF: Migrate away from classof
The object file format specific derived classes are used in context
where the type is statically known. We don't use isa/dyn_cast and we
want to eliminate MCSymbol::Kind in the base class.
2025-08-03 18:18:44 -07:00
Fangrui Song
570e09047c MCSymbolWasm: Remove classof
The object file format specific derived classes are used in context
where the type is statically known. We don't use isa/dyn_cast and we
want to eliminate MCSymbol::Kind in the base class.
2025-08-03 17:28:33 -07:00
Fangrui Song
b51ff2705f MCSymbolELF: Migrate away from classof
The object file format specific derived classes are used in context
where the type is statically known. We don't use isa/dyn_cast and we
want to eliminate MCSymbol::Kind in the base class.
2025-08-03 16:05:35 -07:00
Fangrui Song
e640ca8b9a MCSymbolELF: Migrate away from classof
The object file format specific derived classes are used in context
where the type is statically known. We don't use isa/dyn_cast and we
want to eliminate MCSymbol::Kind in the base class.
2025-08-03 15:45:36 -07:00
Connector Switch
8b7f81f2de
[NFC] Fix assignment typo. (#151864) 2025-08-03 22:32:00 +08:00
Wesley Wiser
69bec0afbb
[X86] Correct 32-bit immediate assertion and fix 64-bit lowering for huge frame offsets (#123872)
The assertion previously did not work correctly because the operand was
being truncated to an `int` prior to comparison.

Change the assertion into a a reported error as suggested in
https://github.com/llvm/llvm-project/pull/101840#issuecomment-2304992425
by @arsenm

Finally, fix the lowering on 64-bit targets so that offsets larger than
32-bit are correctly addressed and add tests for various reported
issues.
2025-08-03 15:36:23 +05:30
Min-Yih Hsu
7ebbbd885f
[DAG] Always use stack to promote bitcast when the source is vector (#151065)
The optimization introduced by #125637 tried to avoid using stacks to
promote bitcast with vector result type. However, it wouldn't be correct
if the input type is vector. This patch limits that optimizations to
only scalar to vector bitcasts.
2025-08-02 15:32:10 -07:00
Craig Topper
f952a84f2f [TargetLowering] Use getShiftAmountConstant in buildSDIVPow2WithCMov. 2025-08-02 10:50:46 -07:00
AZero13
23022a4683
[SelectionDAG] Move sign pattern check from AArch64 and ARM to general SelectionDAG (#151736)
This works on all cases much like the XOR case above it in SelectionDAG.
2025-08-01 14:46:51 -07:00
Paul Walker
ceb2b9c141
[LLVM][DAGCombiner] fold (shl (X * vscale(C0)), C1) -> (X * vscale(C0 << C1)). (#150651) 2025-08-01 11:42:45 +01:00
黃國庭
f04ea2ef1c
Add m_SelectCCLike matcher to match SELECT_CC or SELECT with SETCC (#149646)
Fix #147282 and  Follow-up to #148834

---------

Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-08-01 10:12:05 +01:00
David Sherwood
05b16aff0f
[DAGCombiner] Add combine for vector interleave of splats (#151110)
This patch adds two DAG combines:

1. vector_interleave(splat, splat, ...) -> {splat,splat,...}
2. concat_vectors(splat, splat, ...) -> wide_splat

where all the input splats are identical. Both of these
together enable us to fold
  concat_vectors(vector_interleave(splat, splat, ...))
into a wide splat. Post-legalisation we must only do the
concat_vector combine if the wider type and splat operation
is legal.

For fixed-width vectors the DAG combine only occurs for
interleave factors of 3 or more, however it's not currently
safe to test this for AArch64 since there isn't any lowering
support for fixed-width interleaves. I've only added
fixed-width tests for RISCV.
2025-08-01 09:58:05 +01:00
Ruiling, Song
451912a24a
[MachineScheduler] Make cluster check more efficient (#150884) 2025-08-01 16:00:42 +08:00
Shilei Tian
faa4c4c2dc
[RegAlloc] Fix use-after-free in RegAllocBase::cleanupFailedVReg (#151435)
#128400 introduced a use-after-free bug in
`RegAllocBase::cleanupFailedVReg` when removing intervals from regunits.
The issue is from the `InterferenceCache` in `RAGreedy`, which holds
`LiveRange*`. The current `InterferenceCache` APIs make it difficult to
update it, and there isn't a straightforward way to do that.

Since #128400 already mentions it's not clear about the necessity of
removing intervals from regunits, this PR avoids the issue by simply
skipping that step.

Fixes SWDEV-527146.
2025-08-01 00:07:57 -04:00
Prabhu Rajasekaran
7c6a1c3d15
[llvm][AsmPrinter] Emit call graph section
Collect the necessary information for constructing the call graph
section, and emit to .callgraph section of the binary.

MD5 hash of the callee_type metadata string is used as the numerical
type id emitted.

Reviewers: ilovepi

Reviewed By: ilovepi

Pull Request: https://github.com/llvm/llvm-project/pull/87576
2025-07-31 13:38:20 -07:00
Craig Topper
2737d013a0
[SelectionDAG] Improve the doxygen description for SDValue::isOperandOf. NFC (#151244)
SDValue::isOperandOf checks the result number in addition to the SDNode*.
SDNode::isOperandOf only checks the SDNode*.
2025-07-31 12:58:27 -07:00
Florian Hahn
078d214672
[TailDup] Delay aggressive computed-goto taildup to after RegAlloc. (#150911)
https://github.com/llvm/llvm-project/pull/114990 allowed more aggressive
tail duplication for computed-gotos in both pre- and post-regalloc tail
duplication.

In some cases, performing tail-duplication too early can lead to worse
results, especially if we duplicate blocks with a number of phi nodes.

This is causing a ~3% performance regression in some workloads using
Python 3.12.

This patch updates TailDup to delay aggressive tail-duplication for
computed gotos to after register allocation.

This means we can keep the non-duplicated version for a bit longer
throughout the backend, which should reduce compile-time as well as
allowing a number of optimizations and simplifications to trigger before
drastically expanding the CFG.

For the case in https://github.com/llvm/llvm-project/issues/106846, I
get the same performance with and without this patch on Skylake.

PR: https://github.com/llvm/llvm-project/pull/150911
2025-07-31 19:20:05 +01:00
Peter Collingbourne
7cdc9781d4
MachineInstrBuilder: Introduce copyMIMetadata() function.
This reduces the amount of boilerplate required when adding a new
field to MIMetadata and reduces the chance of bugs like the
one I fixed in TargetInstrInfo::reassociateOps.

Reviewers: arsenm, nikic

Reviewed By: nikic

Pull Request: https://github.com/llvm/llvm-project/pull/133535
2025-07-31 09:53:01 -07:00
Florian Hahn
69f3ea0852
[MachineBB] Make sure there are successors in terminatorIsComputedGoto. (#151342)
Currently terminatorIsComputedGoto will return for blocks with a
indirect branch terminator and no successor. If there are no successor,
the terminator is likely not a computed goto, return false in that case.

Note that this is currently NFC, as the only use checks it only if there
are successors, but it will be needed in
https://github.com/llvm/llvm-project/pull/150911.

PR: https://github.com/llvm/llvm-project/pull/151342
2025-07-31 17:52:45 +01:00
Prabhu Rajasekaran
9e0dc4f737
[MachineFunction] Move CallSiteInfo constructor out of header (#151520) 2025-07-31 07:34:46 -07:00
Phoebe Wang
ebf96f9316
[X86][APX] Do optimizeMemoryInst for v1X masked load/store (#151331)
Fix redundant LEA: https://godbolt.org/z/34xEYE818
2025-07-31 11:52:39 +08:00
Prabhu Rajasekaran
17ccb849f3
[llvm] Extract and propagate callee_type metadata
Update MachineFunction::CallSiteInfo to extract numeric CalleeTypeIds
from callee_type metadata attached to indirect call instructions.

Reviewers: nikic, ilovepi

Reviewed By: ilovepi

Pull Request: https://github.com/llvm/llvm-project/pull/87575
2025-07-30 14:56:39 -07:00
Kazu Hirata
5672a8723f
[CodeGen] Remove an unnecessary cast (NFC) (#151280)
LoopValStage is already of int.
2025-07-30 07:30:05 -07:00
Sander de Smalen
bae8f1336d
Reland "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG" (#134408)
This tries to reland #123632 (previously reverted by commit
6b1db79887df19bc8e8c946108966aa6021c8b87)

This PR aims to fix coalescing of SUBREG_TO_REG when sub-register
liveness tracking is enabled and this is now the so-manieth
reincarnation of this effort :)

This change is needed in order to enable subreg liveness tracking for 
AArch64, because without the implicit-def, Machine Copy Propagation
would remove a 'redundant' copy because it doesn't realise that the 
top 32-bits of the register are zeroed, which subsequent instructions
rely on. 

Changes compared to previous PR: 

* Rather than updating all instructions that define the source register
(SrcReg) of the SUBREG_TO_REG, this new approach only updates
instructions
that define SrcReg when they dominate the SUBREG_TO_REG. The live-ranges
  are updated accordingly.
2025-07-30 14:42:24 +01:00
Fabian Ritter
ef6eaa045a
[GISel] Introduce MIFlags::InBounds (#150900)
This flag applies to G_PTR_ADD instructions and indicates that the operation
implements an inbounds getelementptr operation, i.e., the pointer operand is in
bounds wrt. the allocated object it is based on, and the arithmetic does not
change that.

It is set when the IRTranslator lowers inbounds GEPs (currently only in some
cases, to be extended with a future PR), and in the
(build|materialize)ObjectPtrOffset functions.

Inbounds information is useful in ISel when we have instructions that perform
address computations whose intermediate steps must be in the same memory region
as the final result. A follow-up patch will start using it for AMDGPU's flat
memory instructions, where the immediate offset must not affect the memory
aperture of the address.

This is analogous to a concurrent effort in SDAG: #131862
(related: #140017, #141725).

For SWDEV-516125.
2025-07-30 13:01:23 +02:00
Paul Walker
13f38c97d5
[LLVM][SelectionDAG] Align poison/undef binop folds with IR. (#149334)
The "at construction" binop folds in SelectionDAG::getNode() has
different behaviour when compared to the equivalent LLVM IR. This PR
makes the behaviour consistent while also extending the coverage to
include signed/unsigned max/min operations.
2025-07-30 11:20:30 +01:00
Pierre van Houtryve
c4b1557097
[DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences (#146054)
Fold sequences where we extract a bunch of contiguous bits from a value,
merge them into the low bit and then check if the low bits are zero or
not.

Usually the and would be on the outside (the leaves) of the expression,
but the DAG canonicalizes it to a single `and` at the root of the
expression.

The reason I put this in DAGCombiner instead of the target combiner is
because this is a generic, valid transform that's also fairly niche, so
there isn't much risk of a combine loop I think.

See #136727
2025-07-30 10:27:19 +02:00
Craig Topper
eddd34227e [TargetLowering] Use getShiftAmountConstant in CTTZTableLookup. NFC 2025-07-29 22:43:42 -07:00