37471 Commits

Author SHA1 Message Date
Simon Pilgrim
0237216f16
[DAG] canCreateUndefOrPoison - add EXTRACT_SUBVECTOR handling (#132745)
Similar to INSERT_SUBVECTOR - the index is constant and will be inbounds
2025-03-24 16:03:47 +00:00
Kazu Hirata
1904241a9e
[CodeGen] Avoid repeated hash lookups (NFC) (#132658) 2025-03-24 07:46:35 -07:00
Pierre van Houtryve
c457c88951
[GlobalISel] Combine (sext (trunc x)) to (sext_inreg x) (#131622)
Split from #131312
2025-03-24 09:32:04 +01:00
Pierre van Houtryve
6e3c24fc0a
[DAG] Combine (sext (sext_in_reg x)) to (sext_in_reg (any_extend x)) (#132386) 2025-03-24 09:31:02 +01:00
Antonio Frighetto
ade2276517 [RegAllocFast] Ensure live-in vregs get reloaded after INLINEASM_BR spills
We have already ensured in 9cec2b246e719533723562950e56c292fe5dd5ad
that `INLINEASM_BR` output operands get spilled onto the stack, both
in the fallthrough path and in the indirect targets. Since reloads of
live-ins values into physical registers contextually happen after all
MIR instructions (and ops) have been visited, make sure such loads are
placed at the start of the block, but after prologues or `INLINEASM_BR`
spills, as otherwise this may cause stale values to be read from the
stack.

Fixes: #74483, #110251.
2025-03-24 09:19:53 +01:00
Fangrui Song
7e6d008023 AsmPrinter: Remove unneeded lowerRelativeReference overrides
The function is only called by AsmPrinter, where there is a fallback
when lowerRelativeReference returns nullptr.

wasm and XCOFF could use the fallback code.

(lowerRelativeReference was introduced in 2016 (https://reviews.llvm.org/D17938)
for C++ relative vtables, but C++ relative vtables ended up using
dso_local_equivalent. llvm/test/MC/COFF/cross-section-relative.ll also
uses this.)
2025-03-23 23:58:41 -07:00
Akshat Oke
174110bf3c
[CodeGen][NPM] Port LiveDebugValues to NPM (#131563) 2025-03-24 11:34:45 +05:30
Kazu Hirata
1019457891
[CodeGen] Use *Set::insert_range (NFC) (#132651)
We can use *Set::insert_range to collapse:

  for (auto Elem : Range)
    Set.insert(E);

down to:

  Set.insert_range(Range);
2025-03-23 21:20:44 -07:00
Mingming Liu
3b20ac00f9
[NFC]Don't use else after a return (#132644)
A trivial code clean-up per
https://llvm.org/docs/CodingStandards.html#don-t-use-else-after-a-return
2025-03-23 18:34:52 -07:00
Kazu Hirata
41b76119ec
[llvm] Use range constructors for *Set (NFC) (#132636) 2025-03-23 15:50:34 -07:00
Fangrui Song
dfae1f968e MCValue: Simplify code with getSubSym 2025-03-23 12:22:44 -07:00
Fangrui Song
b73e144bdf MCValue: Simplify code with getSubSym
MCValue::SymB is a MCSymbolRefExpr *, which might become MCSymbol * in
the future. Simplify some code that uses MCValue::SymB.
2025-03-23 12:13:13 -07:00
Jonathan Cohen
7bda9caa49
Revert "[AArch64][MachineCombiner] Recombine long chains of accumulation instructions into a tree to increase ILP (#126060) (#132607)
This reverts commit c4caf949aa934a219e84d4ba0530bd535e698cdb.
2025-03-23 13:58:00 +02:00
Jonathan Cohen
c4caf949aa
[AArch64][MachineCombiner] Recombine long chains of accumulation instructions into a tree to increase ILP (#126060)
This pattern shows up often in media libraries. The optimization should only
kick in for O3. Currently only supports a single family of accumulation
instructions, but can easily be expanded to support additional
instructions in the future.
2025-03-23 13:25:35 +02:00
Kazu Hirata
f3e8e80563
[llvm] Construct SmallVector with ArrayRef (NFC) (#132560) 2025-03-22 13:11:31 -07:00
Kazu Hirata
cb729be11c
[CodeGen] Avoid repeated hash lookups (NFC) (#132513) 2025-03-22 08:08:28 -07:00
Kazu Hirata
1b189cab5e
[llvm] Use *Set::insert_range (NFC) (#132509)
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range.  This patch uses insert_range in
conjunction with llvm::{predecessors,successors} and
MachineBasicBlock::{predecessors,successors}.
2025-03-22 08:07:33 -07:00
Mikhail R. Gadelha
f138e36d52
[SelectionDAG][RISCV] Avoid store merging across function calls (#130430)
This patch improves DAGCombiner's handling of potential store merges by
detecting function calls between loads and stores. When a function call
exists in the chain between a load and its corresponding store, we avoid
merging these stores if the spilling is unprofitable.

We had to implement a hook on TLI, since TTI is unavailable in
DAGCombine. Currently, it's only enabled for riscv.

This is the DAG equivalent of PR #129258
2025-03-22 10:35:25 -03:00
Fangrui Song
4b417992dd [CodeGen] Rename PLTRelativeVariantKind. NFC
Migrate away from the deprecated MCSymbolRefExpr::VariantKind.
The name "Specifier" is utilized in a few *MCExpr.

> "Relocation specifier" is clear, aligns with Arm and IBM AIX's documentation, and fits the assembler's role seamlessly.
2025-03-21 23:02:08 -07:00
pzzp
d6a2cca77e
[llvm:ir] Add support for constant data exceeding 4GiB (#126481)
The test file is over 4GiB, which is too big, so I didn’t submit it.
2025-03-21 11:44:01 -07:00
Tony Varghese
ff9c5c334a
[shrinkwrap] PowerPC's FP register should be honored when processing the save point for prologue. (#129855)
When generating code for functions that have `__builtin_frame_address`
calls and `noinline` attribute, prologue was not emitted correctly
leading to an assertion failure in PowerPC. The issue was due to
improper insertion of prologue for a function that contain llvm
`__builtin_frame_address`.
Shrink-wrap pass computes the save and restore points of a function.
Default points are the entry and exit points of the function. During
shrink-wrapping the frame-pointer was not honored like the stack pointer
and it was considered as a callee-saved register. This change will treat
the FP similar to SP and will insert the prolog on top the instruction
containing FP.

---------

Co-authored-by: Tony Varghese <tony.varghese@ibm.com>
2025-03-21 12:55:39 -04:00
Kazu Hirata
67a631b406
[CodeGen] Avoid repeated hash lookups (NFC) (#132329) 2025-03-21 08:00:45 -07:00
Ryotaro Kasuga
857a04cd76
[MachinePipeliner] Fix incorrect handlings of unpipelineable insts (#126057)
There was a case where `normalizeNonPipelinedInstructions` didn't
schedule unpipelineable instructions correctly, which could generate
illegal code. This patch fixes this issue by rejecting the schedule if
we fail to insert the unpipelineable instructions at stage 0.

Here is a part of the debug output for `sms-unpipeline-insts3.mir`
before applying this patch.

```
SU(0):   %27:gpr32 = PHI %21:gpr32all, %bb.3, %28:gpr32all, %bb.4
  Successors:
    SU(14): Data Latency=0 Reg=%27
    SU(15): Anti Latency=1

...

SU(14):   %41:gpr32 = ADDWrr %27:gpr32, %12:gpr32common
  Predecessors:
    SU(0): Data Latency=0 Reg=%27
    SU(16): Ord  Latency=0 Artificial
  Successors:
    SU(15): Data Latency=1 Reg=%41
SU(15):   %28:gpr32all = COPY %41:gpr32
  Predecessors:
    SU(14): Data Latency=1 Reg=%41
    SU(0): Anti Latency=1
SU(16):   %30:ppr = WHILELO_PWW_S %27:gpr32, %15:gpr32, implicit-def $nzcv
  Predecessors:
    SU(0): Data Latency=0 Reg=%27
  Successors:
    SU(14): Ord  Latency=0 Artificial

...

Do not pipeline SU(16)
Do not pipeline SU(1)
Do not pipeline SU(0)
Do not pipeline SU(15)
Do not pipeline SU(14)
SU(0) is not pipelined; moving from cycle 19 to 0 Instr: ...
SU(1) is not pipelined; moving from cycle 10 to 0 Instr: ...
SU(15) is not pipelined; moving from cycle 28 to 19 Instr: ...
SU(16) is not pipelined; moving from cycle 19 to 0 Instr: ...
Schedule Found? 1 (II=10)

...

cycle 9 (1) (14) %41:gpr32 = ADDWrr %27:gpr32, %12:gpr32common

cycle 9 (1) (15) %28:gpr32all = COPY %41:gpr32
```

The SUs are traversed in the order of the original basic block, so in
this case a new cycle of each instruction is determined in the order of
`SU(0)`, `SU(1)`, `SU(14)`, `SU(15)`, `SU(16)`. Since there is an
artificial dependence from `SU(16)` to `SU(14)`, which is contradict to
the original SU order, the new cycle of `SU(14)` must be greater than or
equal to the cycle of `SU(16)` at that time. This results in the failure
of scheduling `SU(14)` at stage 0. For now, we reject the schedule for
such cases.
2025-03-21 23:07:41 +09:00
Kazu Hirata
599005686a
[llvm] Use *Set::insert_range (NFC) (#132325)
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range.  This patch replaces:

  Dest.insert(Src.begin(), Src.end());

with:

  Dest.insert_range(Src);

This patch does not touch custom begin like succ_begin for now.
2025-03-20 22:24:06 -07:00
yonghong-song
0ffe83feac
[SelectionDAG] Not issue TRAP node if naked function (#132147)
In [1], Nikita Popov suggested that during lowering 'unreachable' insn
should not generate extra code for naked functions, and this applies to
all architectures. Note that for naked functions, 'unreachable' insn is
necessary in IR since the basic block needs a terminator to end.

This patch checked whether a function is naked function or not. If it is
a naked function, 'unreachable' insn will not generate ISD::TRAP.

  [1] https://github.com/llvm/llvm-project/pull/131731

Co-authored-by: Yonghong Song <yonghong.song@linux.dev>
2025-03-20 18:18:03 -07:00
Michael Maitland
00fabd21bc
[RISCV][RegAlloc] Add getCSRFirstUseCost for RISC-V (#131349)
This is based off of 63efd8e7e68bc.

The following table shows the percent change to the dynamic instruction
count when the function in this patch returns 0 (default) versus other
values.

| benchmark | % speedup 1 over 0 | % speedup 4 over 0 | % speedup 16
over 0 | % speedup 64 over 0 | % speedup 128 over 0 |
| --------------- | ---------------------- | --------------------- |
--------------------- | -------------------- | -------------------- |
| 500.perlbench_r | 0.001018570165 | 0.001049508358 | 0.001001106529 |
0.03382582818 | 0.03395354577 |
| 502.gcc_r | 0.02850551412 | 0.02170512371 | 0.01453021263 |
0.06011008637 | 0.1215691521 |
| 505.mcf_r | -0.00009506373338 | -0.00009090057642 | -0.0000860991497 |
-0.00005027849766 | 0.00001251173791 |
| 520.omnetpp_r | 0.2958940288 | 0.2959715925 | 0.2961141505 |
0.2959823497 | 0.2963124341 |
| 523.xalancbmk_r | -0.0327074721 | -0.01037021046 | -0.3226810542 |
0.02127133714 | 0.02765388389 |
| 525.x264_r | 0.0000001381714403 | -0.00000007041540345 |
-0.00000002156399465 | 0.0000002108993364 | 0.0000002463382874 |
| 531.deepsjeng_r | 0.00000000339777238 | 0.000000003874652714 |
0.000000003636212547 | 0.000000003874652714 | 0.000000003159332213 |
| 541.leela_r | 0.0009186059953 | -0.000424159199 | 0.0004984456879 |
0.274948447 | 0.8135521414 |
| 557.xz_r | -0.000000003547118854 | -0.00004896449559 |
-0.00004910691576 | -0.0000491109983 | -0.00004895599589 |
| geomean | 0.03265937388 | 0.03424232324 | -0.00107917442 |
0.07629116165 | 0.1439913192 |

The following table shows the percent change to the runtime when the
function in this patch returns 0 (default) versus other values.

| benchmark | % speedup 1 over 0 | % speedup 4 over 0 | % speedup 16
over 0 | % speedup 64 over 0 | %speedup 128 over 0 |
| --------------- | ------------------ | ------------------ |
------------------- | ------------------- | ------------------- |
| 500.perlbench_r | 0.1722356761 | 0.2269681109 | 0.2596825578 |
0.361573851 | 1.15041305 |
| 502.gcc_r | -0.548415855 | -0.06187002799 | -0.5553684674 |
-0.8876686237 | -0.4668665535 |
| 505.mcf_r | -0.8786414258 | -0.4150938441 | -1.035517726 |
-0.1860770377 | -0.01904825648 |
| 520.omnetpp_r | 0.4130256072 | 0.6595976188 | 0.897332171 |
0.6252625622 | 0.3869467278 |
| 523.xalancbmk_r | 1.318132014 | -0.003927574 | 1.025962975 |
1.090320253 | -0.789206202 |
| 525.x264_r | -0.03112871796 | -0.00167557587 | 0.06932423155 |
-0.1919840015 | -0.1203585732 |
| 531.deepsjeng_r | -0.259516072 | -0.01973455652 | -0.2723227894 |
-0.005417022257 | -0.02222388177 |
| 541.leela_r | -0.3497178495 | -0.3510447393 | 0.1274508001 |
0.6485542452 | 0.2880651727 |
| 557.xz_r | 0.7683565263 | -0.2197509447 | -0.0431183874 |
0.07518130872 | 0.5236853039 |
| geomean | 0.06506952742 | -0.0211865386 | 0.05072694648 | 0.1684530637
| 0.1020533557 |

I chose to set the value to 5 on RISC-V because it has improvement to
both the dynamic IC and the runtime and because it showed good results
empirically and had a similar effect as setting it to higher numbers.

I looked at some diff and it seems like this patch leads to two things:
1. Less spilling -- not spilling the CSR led to better register
allocation and helped us avoid spills down the line
2. Avoid spilling CSR but spill more on paths that static heuristics
estimate as cold.
2025-03-20 15:20:04 -04:00
Philip Reames
4d4d9d5d33
[TTI] Use TypeSize in isLoadFromStackSlot and isStoreToStackSlot [nfc] (#132244)
Motivation is supporting scalable spills and reloads, e.g. in
https://github.com/llvm/llvm-project/pull/120524.

Looking at this API, I'm suspicious that the access size should just be
coming from the memory operand on the load or store, but we don't appear
to be consistently setting that up. That's a larger change so I may or
may not bother pursuing that.
2025-03-20 10:17:36 -07:00
Phoebe Wang
64555e3d48
[X86][NFCI] Add IsStore parameter to hasConditionalLoadStoreForType (#132153)
Address
https://github.com/llvm/llvm-project/pull/132032#issuecomment-2736936769
2025-03-20 18:25:09 +08:00
Nikita Popov
0738f70615
[Intrinsics] Add Intrinsic::getFnAttributes() (NFC) (#132029)
Most places that call Intrinsic::getAttributes() are only interested in
the function attributes, so add a separate function for that.

The motivation for this is that I'd like to add the ability to specify
range attributes on intrinsics, which requires knowing the function
type. This avoids needing to know the type for most attribute queries.
2025-03-20 09:20:39 +01:00
Diana Picus
e17b3cdfb3
[AMDGPU] Dynamic VGPR support for llvm.amdgcn.cs.chain (#130094)
The llvm.amdgcn.cs.chain intrinsic has a 'flags' operand which may
indicate that we want to reallocate the VGPRs before performing the
call.

A call with the following arguments:
```
llvm.amdgcn.cs.chain %callee, %exec, %sgpr_args, %vgpr_args,
  /*flags*/0x1, %num_vgprs, %fallback_exec, %fallback_callee
```
is supposed to do the following:
- copy the SGPR and VGPR args into their respective registers
- try to change the VGPR allocation
- if the allocation has succeeded, set EXEC to %exec and jump to
%callee, otherwise set EXEC to %fallback_exec and jump to
%fallback_callee

This patch implements the dynamic VGPR behaviour by generating an
S_ALLOC_VGPR followed by S_CSELECT_B32/64 instructions for the EXEC and
callee. The rest of the call sequence is left undisturbed (i.e.
identical to the case where the flags are 0 and we don't use dynamic
VGPRs). We achieve this by introducing some new pseudos
(SI_CS_CHAIN_TC_Wn_DVGPR) which are expanded in the SILateBranchLowering
pass, just like the simpler SI_CS_CHAIN_TC_Wn pseudos. The main reason
is so that we don't risk other passes (particularly the PostRA
scheduler) introducing instructions between the S_ALLOC_VGPR and the
jump. Such instructions might end up using VGPRs that have been
deallocated, or the wrong EXEC mask. Once the whole backend treats
S_ALLOC_VGPR and changes to EXEC as barriers for instructions that use
VGPRs, we could in principle move the expansion earlier (but in the
absence of a good reason for that my personal preference is to keep it
later in order to make debugging easier).

Since the expansion happens after register allocation, we're careful to
select constants to immediate operands instead of letting ISel generate
S_MOVs which could interfere with register allocation (i.e. make it look
like we need more registers than we actually do).

For GFX12, S_ALLOC_VGPR only works in wave32 mode, so we bail out during
ISel in wave64 mode. However, we can define the pseudos for wave64 too
so it's easy to handle if future generations support it.

---------

Co-authored-by: Ana Mihajlovic <Ana.Mihajlovic@amd.com>
Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
2025-03-20 08:38:04 +01:00
David Green
53a395fda3
[AArch64][GlobalISel] Legalize more CTPOP vector types. (#131513)
Similar to other operations, s8, s16 s32 and s64 vector elements are
clamped to legal vector sizes, odd number of elements are widened to the
next power-2 and s128 is scalarized.

This helps legalize cttz as well as ctpop.
2025-03-20 07:21:01 +00:00
David Green
b0876994eb
[AArch64][GlobalISel] Clean up CTLZ vector type legalization. (#131514)
Similar to other operations, s8, s16 and s32 vector elements are clamped
to legal vector sizes, but in this case s64 are scalarized to use the
gpr instructions. This allows vector types to split as opposed to
scalarizing.
2025-03-19 19:28:36 +00:00
Nicholas Guy
3f4b2f12a1
[llvm] Fix crash when complex deinterleaving operates on an unrolled loop (#129735)
When attempting to perform complex deinterleaving on an unrolled loop
containing a reduction, the complex deinterleaving pass would fail to
accommodate the wider types when accumulating the unrolled paths.
Instead of trying to alter the incoming IR to fit expectations, the pass
should instead decide against processing any reduction that results in a
non-complex or non-vector value.
2025-03-19 13:44:02 +00:00
Matt Arsenault
5b6b4fdb4b
DAG: Fix promote of half freeze (#131844) 2025-03-19 18:30:34 +07:00
Alex Bradbury
6db2594c48
[PreISelIntrinsicLowering] Zext/trunc count parameter as necessary for memset_pattern16 emission (#129239)
This patch cleans up the handling of the count parameter in general,
though was initially motivated by a compiler crash upon a memset.pattern
with a narrow count causing a compiler crash due to different types for
CreateMul when converting the count to the number of bytes.

The logic used to name globals means there is some minor renaming churn
in the output to
test/Transforms/PreISelIntrinsicLowering/X86/memset-pattern.ll
irrelevant to the newly added tests (that would crash before).
2025-03-19 11:16:24 +00:00
Nikita Popov
8f66fb7842 [GlobalMerge] Fix handling of const options
For the NewPM, the merge-const option was assigned to an unused
option field. Assign it to the correct one. The merge-const-aggressive
option was not supported -- and invalid options were silently ignored.
Accept it and error on invalid options.

For the LegacyPM, the corresponding cl::opt options were ignored when
called via opt rather than llc.
2025-03-18 15:06:39 +01:00
David Green
bd1be8a242
[CodeGen][GlobalISel] Add a getVectorIdxWidth and getVectorIdxLLT. (#131526)
From #106446, this adds a variant of getVectorIdxTy that returns an LLT.
Many uses only look at the width, so a getVectorIdxWidth was added as
the common base.
2025-03-18 08:31:11 +00:00
Kazu Hirata
62204482c0
[CodeGen] Avoid repeated hash lookups (NFC) (#131722) 2025-03-18 00:26:59 -07:00
Akshat Oke
6be6400848
[LiveDebugValues][NFC] Remove TargetPassConfig from LDVImpl (#131562)
TPC is only used to access the option `ShouldEmitDebugEntryValues`.
2025-03-18 11:04:54 +05:30
Jim Lin
00cad3ed22
[SDAG] Handle extract_subvector in isKnownNeverNaN (#131581)
Propagate nnan across extract_subvector.
2025-03-18 09:37:16 +08:00
Pierre van Houtryve
7dcea28bf9
[AMDGPU] Add identity_combines to RegBankCombiner (#131305) 2025-03-17 10:11:28 +01:00
Kazu Hirata
05607a3f39
[CodeGen] Avoid repeated hash lookups (NFC) (#131551) 2025-03-16 23:52:16 -07:00
Hua Tian
b09b9ac108
[llvm][CodeGen] Fix the empty interval issue in Window Scheduler (#129204)
The interval of newly generated reg in ModuloScheduleExpander is empty.
This will cause crash at some corner case. This patch recalculate the
live intervals of these regs.
2025-03-17 14:28:47 +08:00
Akshat Oke
687c9d359e
[CodeGen][NPM] Port FEntryInserter to NPM (#129857) 2025-03-17 10:35:53 +05:30
Kazu Hirata
b648576528
[CodeGen] Avoid repeated hash lookups (NFC) (#131495) 2025-03-16 11:02:57 -07:00
Fangrui Song
7722d7519c [MC] evaluateAsRelocatableImpl: remove the Fixup argument
Follow-up to d6fbffa23c84e622735b3e880fd800985c1c0072 . This commit
updates all call sites and removes the argument from the function.
2025-03-15 16:10:19 -07:00
Kazu Hirata
6a1fd24e9a
[CodeGen] Avoid repeated hash lookups (NFC) (#131422) 2025-03-15 09:11:59 -07:00
Craig Topper
3b5413c77f [CodeGen] Use MCRegister in DbgVariableLocation. NFC 2025-03-15 00:46:17 -07:00
Matthias Braun
e6382f2111
SelectionDAG: neg (and x, 1) --> SIGN_EXTEND_INREG x, i1 (#131239)
The pattern
```LLVM
%shl = shl i32 %x, 31
%ashr = ashr i32 %shl, 31
```
would be combined to `SIGN_EXTEND_INREG %x, ValueType:ch:i1` by
SelectionDAG.
However InstCombine normalizes this pattern to:
```LLVM
%and = and i32 %x, 1
%neg = sub i32 0, %and
```
This adds matching code to DAGCombiner to catch this variant as well.
2025-03-14 10:47:56 -07:00
Philip Reames
bdb4012fe3 [CodeGen] Remove parameter from LiveRangeEdit::canRematerializeAt [NFC]
Only one caller cares about the true case of this parameter, so move
the check to that single caller.  Note that RegisterCoalescer seems
like it should care, but it already duplicates the check several
lines above.
2025-03-14 09:12:07 -07:00