38435 Commits

Author SHA1 Message Date
Ryan Cowan
eb803df502
[AArch64][GlobalISel] Add G_FMODF instruction (#160061)
This commit adds the intrinsic `G_FMODF` to GMIR & enables its
translation, legalization and instruction selection in AArch64.
2025-10-02 10:30:31 +01:00
Simon Pilgrim
e5b8c24cc0
[DAG] Add ComputeNumSignBits(FREEZE(X)) handling (#161507)
If X is known never under/poison then skip the freeze and return ComputeNumSignBits(X)
2025-10-02 07:27:17 +00:00
Matt Arsenault
9323fbbc4e
RegisterCoalescer: Avoid return after else (#161622) 2025-10-02 05:20:57 +00:00
Matt Arsenault
c6e280e7ed
PeepholeOpt: Fix losing subregister indexes on full copies (#161310)
Previously if we had a subregister extract reading from a
full copy, the no-subregister incoming copy would overwrite
the DefSubReg index of the folding context.

There's one ugly rvv regression, but it's a downstream
issue of this; an unnecessary same class reg-to-reg full copy
was avoided.
2025-10-02 13:36:47 +09:00
Nicolai Hähnle
11a4b2d950
Cleanup the LLVM exported symbols namespace (#161240)
There's a pattern throughout LLVM of cl::opts being exported. That in
itself is probably a bit unfortunate, but what's especially bad about it
is that a lot of those symbols are in the global namespace. Move them
into the llvm namespace.

While doing this, I noticed some other variables in the global namespace
and moved them as well.
2025-10-01 15:32:07 -07:00
Michael Liao
fc6cc4009f [AsmPrinter] Remove unnecessary casts. NFC 2025-10-01 14:23:42 -04:00
Benjamin Maxwell
372d3fb10c
[CodeGen] Remove shouldExpandPartialReductionIntrinsic() hook (NFC) (#161498)
This is unused. Targets can lower/expand the `PARTIAL_REDUCE_*` ISD
nodes.
2025-10-01 13:28:37 +01:00
Florian Hahn
e86b3386fd
[DAGCombine] Support (shl %x, constant) in foldPartialReduceMLAMulOp. (#160663)
Support shifts in foldPartialReduceMLAMulOp by treating (shl %x, %c) as
(mul %x, (shl 1, %c)).

PR: https://github.com/llvm/llvm-project/pull/160663
2025-10-01 09:06:01 +00:00
Matt Arsenault
9811226967
PeepholeOpt: Try to constrain uses to support subregister (#161338)
This allows removing a special case hack in ARM. ARM's implementation
of getExtractSubregLikeInputs has the strange property that it reports
a register with a class that does not support the reported subregister
index. We can however reconstrain the register to support this usage.

This is an alternative to #159600. I've included the test, but
the output is different. In this case version the VMOVSR is
replaced with an ordinary subregister extract copy.
2025-10-01 00:18:51 +09:00
paperchalice
c6d3b517ee
[DAGCombiner] Remove most NoSignedZerosFPMath uses (#161180)
Remained two uses are related to fneg and foldFPToIntToFP, some AMDGPU
tests are duplicated and regenerated.
2025-09-30 11:44:34 +08:00
Vladislav Dzhidzhoev
2d1f9c95d9
Reland "[DebugInfo][DwarfDebug] Separate creation and population of abstract subprogram DIEs" (#160786)
This is an attempt to reland
https://github.com/llvm/llvm-project/pull/159104 with the fix for
https://github.com/llvm/llvm-project/issues/160197.

The original patch had the following problem: when an abstract
subprogram DIE is constructed from within
`DwarfDebug::endFunctionImpl()`,
`DwarfDebug::constructAbstractSubprogramScopeDIE()` acknowledges `unit:`
field of DISubprogram. But an abstract subprogram DIE constructed from
`DwarfDebug::beginModule()` was put in the same compile unit to which
global variable referencing the subprogram belonged, regardless of
subprogram's `unit:`.

This is fixed by adding `DwarfDebug::getOrCreateAbstractSubprogramCU()`
used by both`DwarfDebug:: constructAbstractSubprogramScopeDIE()` and
`DwarfCompileUnit::getOrCreateSubprogramDIE()` when abstract subprogram
is queried during the creation of DIEs for globals in
`DwarfDebug::beginModule()`.

The fix and the already-reviewed code from
https://github.com/llvm/llvm-project/pull/159104 are two separate
commits in this PR.

=====
The original commit message follows:

With this change, construction of abstract subprogram DIEs is split in
two stages/functions: creation of DIE (in
DwarfCompileUnit::getOrCreateAbstractSubprogramDIE) and its population
with children (in
DwarfCompileUnit::constructAbstractSubprogramScopeDIE).

With that, abstract subprograms can be created/referenced from
DwarfDebug::beginModule, which should solve the issue with static local
variables DIE creation of inlined functons with optimized-out
definitions. It fixes https://github.com/llvm/llvm-project/issues/29985.

LexicalScopes class now stores mapping from DISubprograms to their
corresponding llvm::Function's. It is supposed to be built before
processing of each function (so, now LexicalScopes class has a method
for "module initialization" alongside the method for "function
initialization"). It is used by DwarfCompileUnit to determine whether a
DISubprogram needs an abstract DIE before DwarfDebug::beginFunction is
invoked.

DwarfCompileUnit::getOrCreateSubprogramDIE method is added, which can
create an abstract or a concrete DIE for a subprogram. It accepts
llvm::Function* argument to determine whether a concrete DIE must be
created.

This is a temporary fix for
https://github.com/llvm/llvm-project/issues/29985. Ideally, it will be
fixed by moving global variables and types emission to
DwarfDebug::endModule (https://reviews.llvm.org/D144007,
https://reviews.llvm.org/D144005).

Some code proposed by Ellis Hoag <ellis.sparky.hoag@gmail.com> in
https://github.com/llvm/llvm-project/pull/90523 was taken for this
commit.
2025-09-29 14:40:15 +02:00
paperchalice
84e4c0686e
[DAGCombiner] Remove NoSignedZerosFPMath uses in visitFSUB (#160974)
Remove NoSignedZerosFPMath in visitFSUB part, we should always use
instruction level fast math flags.
2025-09-29 19:19:18 +08:00
DST
ce70773cff
Fix some typos in machine verifier comments and trace output (#160049)
Stumbled across a typo in the `MachineVerifier` file and since I had it
open, I changed some other comments.

Not important but why not leave it a bit cleaner 🙂

---------

Signed-off-by: Daniel Stadelmann <dasta_7@hotmail.com>
2025-09-29 10:23:09 +00:00
paperchalice
b0a755b2bf
[TargetLowering] Remove NoSignedZerosFPMath uses (#160975)
Remove NoSignedZerosFPMath in TargetLowering part, users should always
use instruction level fast math flags.
2025-09-29 14:33:56 +08:00
A. Jiang
a558d65604
[CodeGen] Get rid of incorrect std template specializations (#160804)
This patch renames comparators
- from `std::equal_to<llvm::rdf::RegisterRef>` to
`llvm::rdf::RegisterRefEqualTo`, and
- from `std::less<llvm::rdf::RegisterRef>` to
`llvm::rdf::RegisterRefLess`.

The original specializations don't satisfy the requirements for the
original `std` templates by being stateful and
non-default-constructible, so they make the program have UB due to C++17
[namespace.std]/2, C++20/23 [namespace.std]/5.

> A program may explicitly instantiate a class template defined in the
standard library only if the declaration
> - depends on the name of at least one program-defined type, and
> - the instantiation meets the standard library requirements for the
original template.
2025-09-28 21:36:03 +08:00
Hongyu Chen
ebfee327df
[SDAG] Constant fold frexp in signed way (#161015)
Fixes #160981
The exponential part of a floating-point number is signed. This patch
prevents treating it as unsigned.
2025-09-28 10:14:30 +08:00
Matt Arsenault
446b9dcfeb
PeepholeOpt: Use initializer list (#160898) 2025-09-26 15:38:09 +00:00
Matt Arsenault
129394e3f2
Greedy: Make trySplitAroundHintReg try to match hints with subreg copies (#160294)
This is essentially the same patch as
116ca9522e89f1e4e02676b5bbe505e80c4d4933;
when trying to match a physreg hint, try to find a compatible physreg if
there is
a subregister copy. This has the slight difference of using getSubReg on
the hint
instead of getMatchingSuperReg (the other use should also use getSubReg
instead,
it's faster).

At the moment this turns out to have very little effect. The adjacent
code needs
better handling of subregisters, so continue adding this piecemeal. The
X86 test
shows a net reduction in real instructions, plus a few new kills.
2025-09-27 00:14:38 +09:00
Philip Reames
9412769c1e
Revert "[RegAlloc] Strengthen asserts in LiveRangeEdit::scanRemattable [nfc]" (#160897)
Reverts llvm/llvm-project#160765. Failures on buildbot indicate second
assertion does not in fact hold.
2025-09-26 07:55:59 -07:00
Philip Reames
550d425a71
[RegAlloc] Add printer and dump for VNInfo [nfc] (#160758)
Uses the existing format of the LiveRange printer, and just factors it
out so that you can do vni->dump() when debugging, or log a vni in a
debug print statement.
2025-09-26 14:37:35 +00:00
Philip Reames
bba9172778
[RegAlloc] Strengthen asserts in LiveRangeEdit::scanRemattable [nfc] (#160765)
We should always be able to find the VNInfo in the original live
interval which corresponds to the subset we're trying to spill, and the
only cases where we have a VNInfo without a definition instruction are
if the vni is unused, or corresponds to a phi. Adjust the code structure
to explicitly check for PHIDef, and assert the stronger conditions.
2025-09-26 06:55:07 -07:00
Philip Reames
78da592647
[RegAlloc] Add additional tracing in InlineSpiller::rematerializeFor (#160761)
We didn't have trace logging for two cases in this routine which makes
it sometimes hard to tell what is going on. In addition to debug trace
statements, add comments to explain the logic behind the early exits
which don't mark the virtual register live. Suggestions on how to word
these more precisely very welcome; I'm not clear I understand all the
intrinicies of this code myself.
2025-09-26 06:54:11 -07:00
Philip Reames
84df4123e6
[CodeGen] Adjust global-split remat heuristic to match LICM (#160709)
This heuristic was originally added in 40c4aa with the stated purpose of
avoiding global split on live long ranges created by MachineLICM
hoisting trivially rematerializable instructions. In the meantime,
various backends have introduced non-trivial rematerialization cases,
MachineLICM gained an explicitly triviality check, and we've reworked
our APIs to match naming wise. Let's move this heuristic back to truely
trivial remat only.

This is a functional change, though somewhat hard to hit. This change
will cause non-trivially rematerializable instructions to be globally
split more often. This is likely a good thing since non-trivial remat
may not be legal at all possible points in the live interval, but may
cost slightly more compile time.

I don't have a motivating example; I found it when reviewing the callers
of isRemMaterializable(MI).
2025-09-26 06:53:21 -07:00
Lewis Crawford
a27baf9c96
[SelectionDAG] Improve v2f16 maximumnum expansion (#160723)
On targets where f32 maximumnum is legal, but maximumnum on vectors of
smaller types is not legal (e.g. v2f16), try unrolling the vector first
as part of the expansion.

Only fall back to expanding the full maximumnum computation into
compares + selects if maximumnum on the scalar element type cannot be
supported.
2025-09-26 11:37:29 +01:00
Wenju He
745e1e6ad5
[CodeGen] Ignore requiresStructuredCFG check in canSplitCriticalEdge if successor is loop header (#154063)
This addresses a performance issue for our downstream GPU target that
sets requiresStructuredCFG to true. The issue is that EarlyMachineLICM
pass does not hoist loop invariants because a critical edge is not
split.
The critical edge's destination a loop header. Splitting the critical
edge will not break structured CFG.

Add a nvptx test to demonstrate the issue since the target also
requires structured CFG.

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-09-26 17:25:37 +08:00
Benjamin Maxwell
d357e965af
[RegisterCoalescer] Mark implicit-defs of super-registers as dead in remat (#159110)
Currently, something like:

```
$eax = MOV32ri -11, implicit-def $rax
%al = COPY $eax
```

Can be rematerialized as:
```
dead $eax = MOV32ri -11, implicit-def $rax
```

Which marks the full $rax as used, not just $al.

With this change, this is rematerialized as:

```
dead $eax = MOV32ri -11, implicit-def dead $rax, implicit-def $al
```

To indicate that only $al is used. 

Note: This issue is latent right now, but is exposed when #134408 is
applied, as it results in the register pressure being incorrectly
calculated (unless this patch is applied too).

I think this change is in line with past fixes in this area, notably:

059cead5ed

69cd121dd9
2025-09-26 10:08:10 +01:00
Pete Chou
a274ffe259
[MachineSink] Remove subrange of live-ins from super register as well. (#159145)
Post-RA machine sinking could sink a copy of sub-register into
a successor. However, the sub-register might not be removed from the
live-in bitmask of its super register in successor and then a later
pass, e.g, if-converter, may add an implicit use of the register from
live-in resulting in an use of an undefined register. This change makes
sure subrange of live-ins from super register could be removed as well.
2025-09-26 14:45:26 +09:00
paperchalice
1e01c02996
[DAGCombiner] Remove NoSignedZerosFPMath uses in visitFADD (#160635)
Remove these global flags and use node level flags instead.
2025-09-26 11:24:02 +08:00
Luke Lau
be23cdc858
[RegAlloc] Account for use availability when applying rematerializable weight discount (#159180)
This aims to fix the issue that caused https://reviews.llvm.org/D106408
to be reverted.

CalcSpillWeights will reduce the weight of an interval by half if it's
considered rematerializable, so it will be evicted before others.

It does this by checking TII.isTriviallyReMaterializable. However
rematerialization may still fail if any of the defining MI's uses aren't
available at the locations it needs to be rematerialized.
LiveRangeEdit::canRematerializeAt calls allUsesAvailableAt to check this
but CalcSpillWeights doesn't, so the two diverge.

This fixes it by also checking allUsesAvailableAt in CalcSpillWeights. 

In practice this has zero change AArch64/X86-64/RISC-V as measured on
llvm-test-suite, but prevents weights from being perturbed in an
upcoming patch which enables more rematerialization by re-attempting
https://reviews.llvm.org/D106408
2025-09-26 07:57:12 +08:00
AZero13
09bdbfd9d1
[CodeGenPrepare] Bail out of usubo creation if sub's parent is not the same as the comparison (#160358)
We match uadd's behavior here.

Codegen comparison: https://godbolt.org/z/x8j4EhGno
2025-09-25 22:55:01 +09:00
JaydeepChauhan14
0c1087b377
[X86][GlobalISel] Added support for llvm.set.rounding (#156591)
- This implementation is adapted from **SDAG
X86TargetLowering::LowerSET_ROUNDING**.
2025-09-25 22:44:47 +09:00
Matt Arsenault
c80d495908
GlobalISel: Adjust insert point when expanding G_[SU]DIVREM
(#160683)

The insert point management is messy here. We probably should
have an insert point guard, and not have ths dest operand utilities
modify the insert point.

Fixes #159716
2025-09-25 11:00:53 +00:00
Jay Foad
ed30414b0a
[MachineStripDebug] Remove debug instructions from inside bundles (#160297)
Some passes, like AMDGPU's SIInsertHardClauses, wrap sequences of
instructions into bundles, and these bundles may end up with debug
instructions in the middle. Assuming that this is allowed, this patch
fixes MachineStripDebug to be able to remove these instructions from
inside a bundle.
2025-09-25 09:40:54 +01:00
Afanasyev Ivan
3e639930d3
[CodeGen] Extract copy-paste on PHI MachineInstr income removal. (#158634) 2025-09-25 14:59:36 +09:00
Philip Reames
ea721e2fa1
[TII] Split isTrivialReMaterializable into two versions [nfc] (#160377)
This change builds on https://github.com/llvm/llvm-project/pull/160319
which tries to clarify which *callers* (not backends) assume that the
result is actually trivial.

This change itself should be NFC. Essentially, I'm just renaming the
existing isTrivialRematerializable to the non-trivial version and then
adding a new trivial version (with the same name as the prior function)
and simplifying a few callers which want that semantic.

This change does *not* enable non-trivial remat any more broadly than
was already done for our targets which were lying through the old APIs;
that will come separately. The goal here is simply to make the code
easier to follow in terms of what assumptions are being made where.

---------

Co-authored-by: Luke Lau <luke_lau@icloud.com>
2025-09-24 18:52:17 -07:00
AZero13
151a80bbce
[TargetLowering][ExpandABD] Prefer selects over usubo if we do the same for ucmp (#159889)
Same deal we use for determining ucmp vs scmp.

Using selects on platforms that like selects is better than using usubo.

Rename function to be more general fitting this new description.
2025-09-25 10:33:05 +09:00
Rahman Lavaee
59b4074037
[Propeller] Read the CFG profile from the propeller directive. (#160422)
The CFG allows us to do layout optimization in the compiler.
Furthermore, it allows further branch optimization.
2025-09-24 11:52:11 -07:00
David Green
88a2f405ac
[Debug][AArch64] Do not crash on unknown subreg register sizes. (#160442)
The AArch64 zsub regs are scalable, so defined with a size of -1 (which
comes through as 65535). The RegisterSize is only 128, so code to try
and find overlapping regs of a z30_z31 in DwarfEmitter can crash on
trying to access out of range bits in a BitVector. Hexagon and x86 also
contain subregs with unknown sizes.

Ideally most of these would be scalable values but in the meantime add a
check that the register are small enough to overlap with the current
register size, to prevent us from crashing.

This fixes the issue reported on #153810.
2025-09-24 12:40:12 +02:00
Jonas Paulsson
eaff28c93e
[MachineScheduler] Turn SU->isScheduled check into an assert in pickNode() (#160145)
It is unnecessary and confusing to have a do/while loop that checks
SU->isScheduled as this should never be true.

ScheduleDAGMI::updateQueues() is always called after pickNode() and it
sets isScheduled on the SU. Turn this into an assertion instead.
2025-09-24 10:28:35 +02:00
Philip Reames
5c1df39f41 Revert "Speculative buildbot fix after ca2e8f"
This reverts commit bd2dac98ed4f19dcf90c098ae0b9976604880b59, and part of ca2e8fc928ad103f46ca9f827e147c43db3a5c47.

My speculative attempt at fixing buildbot failed, so just roll back the
relavant part of the change.
2025-09-23 16:17:09 -07:00
Philip Reames
8b7a76a2ac [CodeGen] Rename isReallyTriviallyReMaterializable [nfc]
.. to isReMaterializableImpl.  The "Really" naming has always been
awkward, and we're working towards removing the "Trivial" part now,
so go ehead and remove both pieces in a single rename.

Note that this doesn't change any aspect of the current
implementation; we still "mostly" only return instructions which
are trivial (meaning no virtual register uses), but some targets
do lie about that today.
2025-09-23 11:58:37 -07:00
Philip Reames
ca2e8fc928
Update callers of isTriviallyReMaterializable to check trivialness (#160319)
This is a preparatory change for an upcoming reorganization of our
rematerialization APIs. Despite the interface being documented as
"trivial" (meaning no virtual register uses on the instruction being
considered for remat), our actual implementation inconsistently supports
non-trivial remat, and certain backends (AMDGPU and RISC-V mostly) lie
about instructions being trivial to abuse that. We want to allow
non-triial remat more broadly, but first we need to do some cleanup to
make it understandable what's going on.

These three call sites are ones which appear to actually want the
trivial definition, and appear fairly low risk to change.

p.s. I'm deliberately *not* updating any APIs in this change, I'm going
to do that as a followup once it's clear which category each callsite
fits in.
2025-09-23 11:57:36 -07:00
Vladislav Dzhidzhoev
310811af6d
Revert "[DebugInfo][DwarfDebug] Separate creation and population of abstract subprogram DIEs" (#160349)
Reverts llvm/llvm-project#159104 due to the issues reported in
https://github.com/llvm/llvm-project/issues/160197.
2025-09-23 19:44:34 +02:00
Elizaveta Noskova
1132e82a61
[MIR] Support save/restore points with independent sets of registers (#119358)
This patch adds the MIR parsing and serialization support for save and
restore points with subsets of callee saved registers. That is, it
syntactically allows a function to contain two or more distinct
sub-regions in which distinct subsets of registers are spilled/filled as
callee save. This is useful if e.g. one of the CSRs isn't modified in
one of the sub-regions, but is in the other(s).

Support for actually using this capability in code generation is still
forthcoming. This patch is the next logical step for multiple
save/restore points support.

All points are now stored in DenseMap from MBB to vector of
CalleeSavedInfo.

Shrink-Wrap points split Part 4.
RFC:
https://discourse.llvm.org/t/shrink-wrap-save-restore-points-splitting/83581

Part 1: https://github.com/llvm/llvm-project/pull/117862 (landed)
Part 2: https://github.com/llvm/llvm-project/pull/119355 (landed)
Part 3: https://github.com/llvm/llvm-project/pull/119357 (landed)
Part 5: https://github.com/llvm/llvm-project/pull/119359 (likely to be
further split)
2025-09-23 11:54:52 +03:00
Matt Arsenault
d08e4458e7
Greedy: Make eviction broken hint cost use CopyCost units (#160084)
Change the eviction advisor heuristic cost based on number of broken
hints to work in units of copy cost, rather than a magic number 1.
The intent is to allow breaking hints for cheap subregisters in favor
of more expensive register tuples.

The llvm.amdgcn.image.dim.gfx90a.ll change shows a simple example of
the case I am attempting to solve. Use of tuples in ABI contexts ends up
looking like this:

  %argN = COPY $vgprN
  %tuple = inst %argN
  $vgpr0 = COPY %tuple.sub0
  $vgpr1 = COPY %tuple.sub1
  $vgpr2 = COPY %tuple.sub2
  $vgpr3 = COPY %tuple.sub3

Since there are physreg copies in the input and output sequence,
both have hints to a physreg. The wider tuple hint on the output
should win though, since this satisfies 4 hints instead of 1.

This is the obvious part of a larger change to better handle
subregister interference with register tuples, and is not sufficient
to handle the original case I am looking at. There are several bugs here
that are proving tricky to untangle. In particular, there is a double
counting bug for all registers with multiple regunits; the cost of
breaking
the interfering hint is added for each interfering virtual register,
which have repeat visits across regunits. Fixing the double counting
badly
regresses a number of RISCV tests, which seem to rely on overestimating 
the cost in tryFindEvictionCandidate to avoid early-exiting the eviction
candidate loop (RISCV is possibly underestimating the copy costs for
vector registers).
2025-09-23 08:03:01 +09:00
Prabhu Rajasekaran
42b195e1bf
[llvm][AsmPrinter] Add direct calls to callgraph section (#155706)
Extend CallGraphSection to include metadata about direct calls. This
simplifies the design of tools that must parse .callgraph section to not
require dependency on MC layer.
2025-09-22 14:34:01 -07:00
Tobias Stadler
dfbd76bda0
[Remarks] Restructure bitstream remarks to be fully standalone (#156715)
Currently there are two serialization modes for bitstream Remarks:
standalone and separate. The separate mode splits remark metadata (e.g.
the string table) from actual remark data. The metadata is written into
the object file by the AsmPrinter, while the remark data is stored in a
separate remarks file. This means we can't use bitstream remarks with
tools like opt that don't generate an object file. Also, it is confusing
to post-process bitstream remarks files, because only the standalone
files can be read by llvm-remarkutil. We always need to use dsymutil
to convert the separate files to standalone files, which only works for
MachO. It is not possible for clang/opt to directly emit bitstream
remark files in standalone mode, because the string table can only be
serialized after all remarks were emitted.

Therefore, this change completely removes the separate serialization
mode. Instead, the remark string table is now always written to the end
of the remarks file. This requires us to tell the serializer when to
finalize remark serialization. This automatically happens when the
serializer goes out of scope. However, often the remark file goes out of
scope before the serializer is destroyed. To diagnose this, I have added
an assert to alert users that they need to explicitly call
finalizeLLVMOptimizationRemarks.

This change paves the way for further improvements to the remark
infrastructure, including more tooling (e.g. #159784), size optimizations
for bitstream remarks, and more.

Pull Request: https://github.com/llvm/llvm-project/pull/156715
2025-09-22 16:41:39 +01:00
YixingZhang007
f91e0bf160
[SPIRV] Add support for the SPIR-V extension SPV_KHR_bfloat16 (#155645)
This PR introduces the support for the SPIR-V extension
`SPV_KHR_bfloat16`. This extension extends the `OpTypeFloat` instruction
to enable the use of bfloat16 types with cooperative matrices and dot
products.

TODO:
Per the `SPV_KHR_bfloat16` extension, there are a limited number of
instructions that can use the bfloat16 type. For example, arithmetic
instructions like `FAdd` or `FMul` can't operate on `bfloat16` values.
Therefore, a future patch should be added to either emit an error or
fall back to FP32 for arithmetic in cases where bfloat16 must not be
used.

Reference Specification:

https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/KHR/SPV_KHR_bfloat16.asciidoc
2025-09-22 14:52:57 +02:00
Matt Arsenault
c077822b52
Regalloc: Add operator >= to EvictionCost (#160070)
Make the actual use context less ugly.
2025-09-22 20:25:34 +09:00
Jay Foad
cecdff9283
Greedy: Simplify collectHintInfo using MachineOperands. NFCI. (#159724)
If a COPY uses Reg but only in an implicit operand then the new
implementation ignores it but the old implementation would have treated
it as a copy of Reg. Probably this case never occurs in practice. Other
than that, this patch is NFC.

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-09-22 11:48:52 +01:00