37667 Commits

Author SHA1 Message Date
Philip Reames
d1b3eeb244
[SDAG] Merge memcpy and memcpy.inline lowering paths (#138619)
This is a follow up to c0a264e, but note that there is a functional
difference here: the root changes for the memcpy.inline case. This
difference appears to have been accidental, but I kept this back to
facility separate review in case there's something I'm missing here.
2025-05-06 07:37:44 -07:00
Sander de Smalen
d90cac9641
[DAGCombine] Simplify partial_reduce_*mla with constant. (#138289)
partial_reduce_*mla(acc, mul(ext(x), splat(C)), splat(1))
-> partial_reduce_*mla(acc, x, C)
2025-05-06 13:51:52 +01:00
Simon Pilgrim
bde39d7251
[DAG] Add SDPatternMatch::m_BitwiseLogic common matcher for AND/OR/XOR nodes (#138301) 2025-05-06 12:50:50 +01:00
Cullen Rhodes
8ea5eacea2
[MISched] Fix off-by-one error in debug output with -misched-cutoff=<n> flag (#137988)
This flag instructs the scheduler to stop scheduling after N
instructions, but
in the debug output it appears as if it's scheduling N+1 instructions,
e.g.

$ llc -misched-cutoff=10 -debug-only=machine-scheduler
example.ll 2>&1 | grep "^Scheduling SU" | wc -l
11

as it calls pickNode before calling checkSchedLimit.
2025-05-06 11:12:23 +01:00
jyli0116
fd80048738
[GlobalISel][AArch64] Handles bitreverse to prevent falling back (#138150)
Handles bitreverse for vector types which were previously falling back
onto Selection DAG. Includes 8-bit element vectors greater than 128 bits
and less than 64 bits: <32 x i8>, <4 x i8>, and odd vector types: <9 x
i8>.
2025-05-06 09:57:01 +01:00
Philip Reames
c0a264e6a9
[IntrinsicInst] Remove MemCpyInlineInst and MemSetInlineInst [nfc] (#138568)
I'm looking for ways to simplify the Mem*Inst class structure, and these
two seem to have fairly minimal justification, so let's remove them.
2025-05-05 14:07:31 -07:00
Jeffrey Byrnes
00e7a02295
[ScheduleDAG] Allow disabling the SchedModel / Itineraries during Scheduling (#138057)
This provides the `disable-schedmodel-in-sched-mi` flag. Using this, we
will disable the SchedModel / Itineraries during scheduling. This has
the effect of not using any latency / hardware resource information for
scheduling decisions.

We have the `schedmodel` flag, but this disables the `SchedModel` for
all passes. This allows disabling only for scheduling while preserving
the behavior of other passes (e.g. MachineLICM). This is conceptually
similar to other flags like `enable-aa-sched-mi`
2025-05-05 14:07:23 -07:00
Kazu Hirata
cdc9a4b5f8
[CodeGen] Use range-based for loops (NFC) (#138488)
This is a reland of #138434 except that:

- the bits for llvm/lib/CodeGen/RenameIndependentSubregs.cpp
  have been dropped because they caused a test failure under asan, and

- the bits for llvm/lib/CodeGen/SelectionDAG/ScheduleDAGFast.cpp have
  been improved with structured bindings.
2025-05-05 10:08:49 -07:00
Kazu Hirata
f81193ddfd
[SelectionDAG] Remove obsolete comments (NFC) (#138483)
These functions do not return boolean values.
2025-05-05 10:08:19 -07:00
KRM7
0926d94453
[GlobalISel] Take the result size into account when const folding icmp (#134365)
The current implementation always creates a 1 bit constant for the
result of the `G_ICMP`, which will cause issues if the destination
register size is larger than that. With asserts enabled, it will cause a
crash in `buildConstant`:
```
llvm/lib/CodeGen/GlobalISel/MachineIRBuilder.cpp:322: virtual MachineInstrBuilder llvm::MachineIRBuilder::buildConstant(const DstOp &, const ConstantInt &): Assertion `EltTy.getScalarSizeInBits() == Val.getBitWidth() && "creating constant with the wrong size"' failed. 
```
2025-05-05 19:01:53 +02:00
Kazu Hirata
aa15596b5f
[llvm] Remove unused local variables (NFC) (#138478) 2025-05-04 21:33:54 -07:00
Nico Weber
1d955489c3 Revert "[CodeGen] Use range-based for loops (NFC) (#138434)"
This reverts commit a9699a334bc9666570418a3bed9520bcdc21518b.

Breaks CodeGen/AMDGPU/collapse-endcf.ll in several configs
(sanitizer builds; macOS; possibly more), see comments on
https://github.com/llvm/llvm-project/pull/138434
2025-05-04 17:36:52 -04:00
Kazu Hirata
c51a3aa6ce
[llvm] Remove unused local variables (NFC) (#138467) 2025-05-04 13:05:18 -07:00
Kazu Hirata
47f391fd0e
[CodeGen] Remove unused local variables (NFC) (#138441) 2025-05-04 00:26:37 -07:00
Kazu Hirata
a9699a334b
[CodeGen] Use range-based for loops (NFC) (#138434) 2025-05-04 00:26:19 -07:00
Kazu Hirata
aa613777af
[llvm] Remove redundant control flow (NFC) (#138304) 2025-05-02 10:34:25 -07:00
Simon Pilgrim
b5dbddd200
[DAG] visitEXTRACT_SUBVECTOR - change fold helper methods to take operands instead of EXTRACT_SUBVECTOR node. NFC. (#138279)
Call with the individual subvector type, source vector and index operands instead of the original EXTRACT_SUBVECTOR node.

Some of these folds still assumed that EXTRACT_SUBVECTOR/INSERT_SUBVECTOR nodes could have variable indices, despite us moving to all constant indices some time ago - all of that code has now been simplified.

I've moved the narrowExtractedVectorBinOp call higher up, but it won't affect fold order - it didn't rely on the peekThroughBitcasts call, and worked on BinOps, not BUILD_VECTOR/INSERT_SUBVECTOR nodes.

Prep work to make it easier for more of these folds to work through BITCAST nodes.
2025-05-02 17:24:56 +01:00
Kazu Hirata
4ec473e0e1
[llvm] Remove redundant calls to std::unique_ptr<T>::get (NFC) (#138236) 2025-05-02 08:53:53 -07:00
Alexander Peskov
cffb8aee14
[DEBUGINFO] Propagate debug metadata for sext SDNode. (#135971)
In some cases of chained `sext` operators the debug metadata can be
missed. This patch propagates proper metadata to resulting node.

Particular case of issue is NVPTX codegen for function with bool local
variable:
```
void test(int i) {
  bool xyz = i == 0;
  foo(i);
}
```

---------

Signed-off-by: Alexander Peskov <apeskov@nvidia.com>
2025-05-02 08:31:41 -04:00
Benjamin Maxwell
40edb37bb3
[StackSlotColoring] Fix issue where colors for a StackID are dropped (#138140)
In InitializeSlots, if an interval with a non-zero StackID (A) is
encountered we set All/UsedColors to a size of A + 1. If after this we
process another interval with a non-zero StackID (B), where B < A, then
we resize All/UsedColors to size < A + 1, and lose the BitVector
associated with A.

AFAIK this is a latent bug upstream, but when adding a new TargetStackID
locally I hit a `NextColors[StackID] != -1 && "No more spill slots?"`
assertion due to this issue.
2025-05-02 10:23:39 +01:00
Nikita Popov
4109bac330
[IR] Do not store Function inside BlockAddress (#137958)
Currently BlockAddresses store both the Function and the BasicBlock they
reference, and the BlockAddress is part of the use list of both the
Function and BasicBlock.

This is quite awkward, because this is not really a use of the function
itself (and walks of function uses generally skip block addresses for
that reason). This also has weird implications on function RAUW (as that
will replace the function in block addresses in a way that generally
doesn't make sense), and causes other peculiar issues, like the ability
to have multiple block addresses for one block (with different
functions).

Instead, I believe it makes more sense to specify only the basic block
and let the function be implied by the BB parent. This does mean that we
may have block addresses without a function (if the BB is not inserted),
but this should only happen during IR construction.
2025-05-02 09:40:50 +02:00
Jim Lin
f46ff4c204 [NFC][regalloc] Fix typo in llvm/lib/CodeGen/AllocationOrder.h. 2025-05-02 10:11:34 +08:00
Rahul Joshi
f24606376d
[NFC][LLVM][CodeGen] Refactor MIR Printer (#137361)
- Move `MIPrinter` class to anonymous namespace, and remove it as a
friend of `MachineBasicBlock`.
- Move `canPredictBranchProbabilities` to `MachineBasicBlock` and change
it to use the new `BranchProbability::normalizeProbabilities` function
that accepts a range, and also to use `llvm::equal()` to check equality
of the two vectors.
- Use `ListSeparator` to print comma separate lists instead of manual
code to do that.
2025-05-01 10:00:54 -07:00
Philip Reames
2bb2f8ab49
[CodeGen] Remove experimental deferred spilling from GreedyRegAlloc (#137850)
This experimental option was introduced in 2015 via commit 1192294, and
the target hook was added in 2020 via commit 99e865b6. There does not
appear to have ever been a use of this target hook in tree.

This code is complicating one of the most complicated and hard to
understand parts of our code base, and was an experiment introduced
nearly 10 years ago. Let's get rid of it.

Note that the idea described in the original patch is not neccessarily a
bad one, and we might return to it someday.
2025-05-01 08:11:51 -07:00
Rahul Joshi
64f552cefa
[NFC][LLVM][CodeGen] Refactor MachineInstr operand accessors (#137261)
- Change MachineInstr operand accessors to use `ArrayRef` internally to
slice the operand array into sub-arrays.
- Minor: remove unnecessary {} on `MachineInstrBuilder::add`.
2025-05-01 07:45:22 -07:00
Nicholas Guy
b6f65f07bc
[SelectionDAG] Improve type legalisation for PARTIAL_REDUCE_MLA (#130935)
Implement proper splitting functions for PARTIAL_REDUCE_MLA ISD nodes.
This makes the udot_8to64 and sdot_8to64 tests generate dot product
instructions for when the new ISD nodes are used.

---------

Co-authored-by: James Chesterman <james.chesterman@arm.com>
2025-05-01 15:08:46 +01:00
David Green
9b1051281e
[DAG] Use SDValue for PatFrag checks (#137519)
If the SDNode is used it can pick up the wrong results number, for
example looking at the known bits of the first result where it should be
looking at the second. The SDValue is already present as the
SelectCodeCommon checks move from parent to child, pass the SDValue
through to CheckNodePredicate as Op so that it can use it if necessary.
SDNode *N is still generated, keeping most PatFrags the same.

Fixes #137274
2025-05-01 08:58:59 +01:00
Jonathan Thackray
6e49f73825
Reland [llvm] Add support for llvm IR atomicrmw fminimum/fmaximum instructions (#137701)
This patch adds support for LLVM IR atomicrmw `fmaximum` and `fminimum`
instructions.

These mirror the `llvm.maximum.*` and `llvm.minimum.*` instructions, but
are atomic and use IEEE754 2019 handling for NaNs, which is different to
`fmax` and `fmin`. See:
     https://llvm.org/docs/LangRef.html#llvm-minimum-intrinsic
for more details.

Future changes will allow this LLVM IR to be lowered to specialised
assembler instructions on suitable targets, such as AArch64.
2025-04-30 22:06:37 +01:00
mssefat
7495f92f08
[AMDGPU] Fix undefined scc register in successor block of SI_KILL terminators (#134718)
Fix issue 131298 where an undefined $scc register causes verifier errors
when using SI_KILL_F32_COND_IMM_TERMINATOR instructions. The problem
occurs because the $scc register defined in a comparison before the kill
terminator is used in successor blocks, but was not properly marked as live-in.

This patch:
- Adds code to check if SCC is used in the successor block
- Adds SCC as a live-in to successor blocks
- Handles both explicit and implicit uses of SCC

With this patch the machine verifier no longer reports undefined $scc
errors in following kill terminator instruction.

Fixes #131298

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-04-30 09:02:45 -05:00
Jie Fu
6e43cdbc25 [CodeGen] Remove unused variable 'ID' (NFC)
/llvm-project/llvm/lib/CodeGen/VirtRegMap.cpp:225:15:
error: unused variable 'ID' [-Werror,-Wunused-variable]
  static char ID;
              ^
1 error generated.
2025-04-30 19:15:27 +08:00
Stephen Tozer
92195f6fc8 Reapply "[DLCov] Implement DebugLoc coverage tracking (#107279)"
Reapplied after fixing the config issue that was causing issues following
the previous merge.

This reverts commit fdbf073a86573c9ac4d595fac8e06d252ce1469f.
2025-04-30 11:39:29 +01:00
Akshat Oke
e91cbd4f29
[CodeGen][NPM] Port VirtRegRewriter to NPM (#130564) 2025-04-30 14:10:46 +05:30
YunQiang Su
db859db74d Revert "CodeGen: Add ISD::AssertNoFPClass (#135946)"
This reverts commit f0c61d2242bbc7576ca5e4137a5ea8f63e4859a9.
2025-04-30 16:16:26 +08:00
Vikram Hegde
53a8b89003
[CodeGen][NewPM] Port "ShrinkWrap" pass to NPM (#129880) 2025-04-30 13:11:17 +05:30
paperchalice
159628cc22
[CodeGen] Port MachineUniformityAnalysis to new pass manager (#137578)
- Add new pass manager version of `MachineUniformityAnalysis `.
- Query `TargetTransformInfo` in new pass manager version.
- Use `printAsOperand` when printing machine function name
2025-04-30 10:44:06 +08:00
Sergei Barannikov
becd418626
[CGP] Despeculate ctlz/cttz with "illegal" integer types (#137197)
The code below the removed check looks generic enough to support
arbitrary integer widths. This change helps 32-bit targets avoid
expensive expansion/libcalls in the case of zero input.

Pull Request: https://github.com/llvm/llvm-project/pull/137197
2025-04-29 22:33:40 +03:00
Tobias Stadler
0b5daeb2e5
[GlobalISel] Fix miscompile when narrowing vector loads/stores to non-byte-sized types (#136739)
LegalizerHelper::reduceLoadStoreWidth does not work for non-byte-sized
types, because this would require (un)packing of bits across byte
boundaries.

Precommit tests: #134904
2025-04-29 12:36:34 +01:00
Vikram Hegde
86d8e8d9a6
[CodeGen][NewPM] Port "PrologEpilogInserter" to NPM (#130550) 2025-04-29 13:13:45 +05:30
weiguozhi
b25b51eb63
[InlineSpiller] Check rematerialization before folding operand (#134015)
Current implementation tries to fold the operand before
rematerialization because it can reduce one register usage. But if there
is a physical register available we can still rematerialize it without
causing high register pressure.

This patch do this check to find the better choice. Then we can produce

    xorps %xmm1, %xmm1
    ucomiss %xmm1, %xmm0

instead of 

    ucomiss LCPI0_1(%rip), %xmm0
2025-04-28 09:52:03 -07:00
Jonathan Thackray
7ee0097b48
Revert "[llvm] Add support for llvm IR atomicrmw fminimum/fmaximum instructions" (#137657)
Reverts llvm/llvm-project#136759 due to bad interaction with c792b25e4
2025-04-28 16:53:36 +01:00
Jonathan Thackray
ba420d8122
[llvm] Add support for llvm IR atomicrmw fminimum/fmaximum instructions (#136759)
This patch adds support for LLVM IR atomicrmw `fmaximum` and `fminimum`
instructions.

These mirror the `llvm.maximum.*` and `llvm.minimum.*` instructions, but
are atomic and use IEEE754 2019 handling for NaNs, which is different to
`fmax` and `fmin`. See:
     https://llvm.org/docs/LangRef.html#llvm-minimum-intrinsic
for more details.

Future changes will allow this LLVM IR to be lowered to specialised
assembler instructions on suitable targets, such as AArch64.
2025-04-28 15:31:44 +01:00
Paul Walker
be82be281d
[LLVM][GlobalISel] Ensure G_{F}CONSTANT only store references to scalar Constant{Int,FP}. (#137319) 2025-04-28 11:40:39 +01:00
John Brawn
dd87127f4e
[DAGCombiner] Eliminate fp casts if we have the right fast math flags (#131345)
When floating-point operations are legalized to operations of a higher
precision (e.g. f16 fadd being legalized to f32 fadd) then we get
narrowing then widening operations between each operation. With the
appropriate fast math flags (nnan ninf contract) we can eliminate these
casts.
2025-04-28 11:21:51 +01:00
Craig Topper
e17f07c4de
[SelectionDAG] Reduce code duplication between getStore, getTruncStore, and getIndexedStore. (#137435)
Create an extra overload of getStore that can handle of the 3 types of
stores. This is similar to how getLoad/getExtLoad/getIndexLoad is
structure.
2025-04-27 22:32:53 -07:00
Owen Rodley
d3d856ad84
Clean up external users of GlobalValue::getGUID(StringRef) (#129644)
See https://discourse.llvm.org/t/rfc-keep-globalvalue-guids-stable/84801
for context.

This is a non-functional change which just changes the interface of
GlobalValue, in preparation for future functional changes. This part
touches a fair few users, so is split out for ease of review. Future
changes to the GlobalValue implementation can then be focused purely on
that class.

This does the following:

* Rename GlobalValue::getGUID(StringRef) to
  getGUIDAssumingExternalLinkage. This is simply making explicit at the
  callsite what is currently implicit.
* Where possible, migrate users to directly calling getGUID on a
  GlobalValue instance.
* Otherwise, where possible, have them call the newly renamed
  getGUIDAssumingExternalLinkage, to make the assumption explicit.


There are a few cases where neither of the above are possible, as the
caller saves and reconstructs the necessary information to compute the
GUID themselves. We want to migrate these callers eventually, but for
this first step we leave them be.
2025-04-28 11:09:43 +10:00
Kazu Hirata
5cfd81b0cc
[llvm] Use range constructors of *Set (NFC) (#137552) 2025-04-27 15:59:57 -07:00
Kazu Hirata
8210cdd764
[llvm] Use llvm::replace (NFC) (#137481) 2025-04-26 18:18:09 -07:00
Kazu Hirata
8ba3a232d1
[llvm] Use llvm::copy (NFC) (#137470) 2025-04-26 15:50:38 -07:00
Sergei Barannikov
bb1765179e
[TTI] Simplify implementation (NFCI) (#136674)
Replace "concept based polymorphism" with simpler PImpl idiom.

This pursues two goals:
* Enforce static type checking. Previously, target implementations hid
base class methods and type checking was impossible. Now that they
override the methods, the compiler will complain on mismatched
signatures.
* Make the code easier to navigate. Previously, if you asked your
favorite LSP server to show a method (e.g. `getInstructionCost()`), it
would show you methods from `TTI`, `TTI::Concept`, `TTI::Model`,
`TTIImplBase`, and target overrides. Now it is two less :)

There are three commits to hopefully simplify the review.

The first commit removes `TTI::Model`. This is done by deriving
`TargetTransformInfoImplBase` from `TTI::Concept`. This is possible
because they implement the same set of interfaces with identical
signatures.

The first commit makes `TargetTransformImplBase` polymorphic, which
means all derived classes should `override` its methods. This is done in
second commit to make the first one smaller. It appeared infeasible to
extract this into a separate PR because the first commit landed
separately would result in tons of `-Woverloaded-virtual` warnings (and
break `-Werror` builds).

The third commit eliminates `TTI::Concept` by merging it with the only
derived class `TargetTransformImplBase`. This commit could be extracted
into a separate PR, but it touches the same lines in
`TargetTransformInfoImpl.h` (removes `override` added by the second
commit and adds `virtual`), so I thought it may make sense to land these
two commits together.

Pull Request: https://github.com/llvm/llvm-project/pull/136674
2025-04-26 15:25:40 +03:00
David Green
b9e32749d2
[GlobalISel] Clear nsw flags when converting sub to add. (#137288)
As shown in https://alive2.llvm.org/ce/z/PVwcTL we need to clear the nsw
flags too when converting a sub to a add if the constant is INT_MIN.

Fixes #137254
2025-04-26 11:00:53 +01:00