38715 Commits

Author SHA1 Message Date
Matt Arsenault
253ed52436
DAG: Use poison for some vector result widening (#168290) 2025-11-19 16:49:43 -05:00
Matt Arsenault
a757c4e74e
CodeGen: Add subtarget to TargetLoweringBase constructor (#168620)
Currently LibcallLoweringInfo is defined inside of TargetLowering,
which is owned by the subtarget. Pass in the subtarget so we can
construct LibcallLoweringInfo with the subtarget. This is a temporary
step that should be revertable in the future, after LibcallLoweringInfo
is moved out of TargetLowering.
2025-11-19 19:18:13 +00:00
Matt Arsenault
0b921f52cc
DAG: Use poison when splitting vector_shuffle results (#168176) 2025-11-19 12:27:08 -05:00
Ryan Cowan
58e6d02aa2
[AArch64][GlobalISel] Check unmergeSrc is a vector in matchCombineBuildUnmerge (#168692)
This aims to fix the crash in #168495, my combine rule was
missing a check that the source vector was in fact a vector. This then
caused the legality check to fail in this example as the concat was
trying to concat a non vector.

I have also gated the bitcast of the concat to only work on non-scalable
vectors as the mutation calls `getNumElements` which crashes when called
on a scalable vector.

Fixes #168495
2025-11-19 12:30:51 +00:00
陈子昂
e38529ddbb
[DAG] Update canCreateUndefOrPoison to handle ISD::VECTOR_COMPRESS (#168010)
Fixes #167710
2025-11-19 10:21:05 +00:00
Tom Tromey
1262acf4ec
Introduce DwarfUnit::addBlock helper method (#168446)
This patch is just a small cleanup that unifies the various spots that
add a DWARF expression to the output.
2025-11-18 22:59:36 +00:00
Craig Topper
1157a22134
[GISel] Use getScalarSizeInBits in LegalizerHelper::lowerBitCount (#168584)
For vectors, CTLZ, CTTZ, CTPOP all operate on individual elements. The
lowering should be based on the element width.

I noticed this by inspection. No tests in tree are currently affected,
but I thought it would be good to fix so someone doesn't have to debug
it in the future.
2025-11-18 12:26:47 -08:00
Craig Topper
96e58b83a3
[RISCV] Legalize misaligned unmasked vp.load/vp.store to vle8/vse8. (#167745)
If vector-unaligned-mem support is not enabled, we should not generate
loads/stores that are not aligned to their element size.

We already do this for non-VP vector loads/stores.

This code has been in our downstream for about a year and a half after
finding the vectorizer generating misaligned loads/stores. I don't think
that is unique to our downstream.

Doing this for masked vp.load/store requires widening the mask as well
which is harder to do.

NOTE: Because we have to scale the VL, this will introduce additional
vsetvli and the VL optimizer will not be effective at optimizing any
arithmetic that is consumed by the store.
2025-11-18 11:13:54 -08:00
Hongyu Chen
523bd2df6d
[GISel][RISCV] Compute CTPOP of small odd-sized integer correctly (#168559)
Fixes the assertion in #168523
This patch lifts the small, odd-sized integer to 8 bits, ensuring that
the following lowering code behaves correctly.
2025-11-18 18:49:13 +00:00
Nathan Corbyn
93a8ca8fc7
[AArch64][GISel] Don't crash in known-bits when copying from vectors to non-vectors (#168081)
Updates the demanded elements before recursing through copies in case
the type of the source register changes from a non-vector register to a
vector register.

Fixes #167842.
2025-11-18 16:42:58 +00:00
Hassnaa Hamdi
3d5d32c605
[CGP]: Optimize mul.overflow. (#148343)
- Detect cases where LHS & RHS values will not cause overflow
(when the Hi halfs are zero).
2025-11-18 13:15:47 +00:00
David Green
4ecfaa602f
[AArch64][GlobalISel] Add better basic legalization for llround. (#168427)
This adds handling for f16 and f128 lround/llround under LP64 targets,
promoting the f16 where needed and using a libcall for f128. This
codegen is now identical to the selection dag version.
2025-11-18 12:05:02 +00:00
Sander de Smalen
f369a53d82
[DAGCombiner] Fold select into partial.reduce.add operands. (#167857)
This generates more optimal codegen when using partial reductions with
predication.

```
partial_reduce_*mla(acc, sel(p, mul(*ext(a), *ext(b)), splat(0)), splat(1))
-> partial_reduce_*mla(acc, sel(p, a, splat(0)), b)

partial.reduce.*mla(acc, sel(p, *ext(op), splat(0)), splat(1))
-> partial.reduce.*mla(acc, sel(p, op, splat(0)), splat(trunc(1)))
```
2025-11-18 09:49:42 +00:00
Aiden Grossman
472e4ab0b0
[MLGO] Fully Remove MLRegalloc Experimental Features (#168252)
20a22a45e96bc94c3a8295cccc9031bd87552725 was supposed to fully remove
these, but left around the functionality to actually compute them and a
unittest that ensured they worked. These are not development features in
the sense of features used in development mode, but experimental
features that have been superseded by MIR2Vec.
2025-11-17 10:07:48 -08:00
Ryan Cowan
d65be16ab6
[AArch64][GlobalISel] Add combine for build_vector(unmerge, unmerge, undef, undef) (#165539)
This PR adds a new combine to the `post-legalizer-combiner` pass. The
new combine checks for vectors being unmerged and subsequently padded
with `G_IMPLICIT_DEF` values by building a new vector. If such a case is
found, the vector being unmerged is instead just concatenated with a
`G_IMPLICIT_DEF` that is as wide as the vector being unmerged.

This removes unnecessary `mov` instructions in a few places.
2025-11-17 15:55:40 +00:00
David Green
22968f5b4a
[DAG] Add strictfp implicit def reg after metadata. (#168282)
This prevents a machine verifier error, where it "Expected implicit
register after groups".

Fixes #158661
2025-11-17 10:57:21 +00:00
Abinaya Saravanan
c946418330
[MachinePipeliner] Detect a cycle in PHI dependencies early on (#167095)
- This patch detects cycles by phis and bails out if one is found.
- It prevents to violate DAG restrictions.

Abort pipelining in the below case

%1 = phi i32 [ %a, %entry ], [ %3, %loop ]
%2 = phi i32 [ %a, %entry ], [ %1, %loop ]
%3 = phi i32 [ %b, %entry ], [ %2, %loop ]

---------

Co-authored-by: Ryotaro Kasuga <kasuga.ryotaro@fujitsu.com>
2025-11-17 15:28:30 +05:30
pvanhout
853ed3b3b7 [InlineAsmLowering] unsigned -> TypeSize for getTypeStoreSize result 2025-11-17 10:21:43 +01:00
hstk30-hw
51c8180515
[GlobalMerge]Prefer use global-merge-max-offset instead of the target-specific constant offset. (#165591)
In the Dhrystone benchmark, I find some adjacent global not be merged,
on the contrary the GCC's anchor optimize is work. Use
global-merge-max-offset to set the max offset can yield similar results
(still slightly different, at least we can control the offset).
2025-11-17 15:37:51 +08:00
ronlieb
6d5f87fc42
Revert "DAG: Allow select ptr combine for non-0 address spaces" (#168292)
Reverts llvm/llvm-project#167909
2025-11-16 18:35:51 -05:00
Kazu Hirata
98d49d51c0
[CodeGen] Remove a redundant declaration (NFC) (#168285)
EnableFSDiscriminator is declared in DebugInfoMetadata.h.

Identified with readability-redundant-declaration.
2025-11-16 14:06:18 -08:00
Matt Arsenault
dd9bd3e8f0
DAG: Preserve poison in combineConcatVectorOfScalars (#168220) 2025-11-16 11:16:34 -08:00
Sergei Barannikov
97a60aa37a
[CodeGen] Turn MCRegUnit into an enum class (NFC) (#167943)
This changes `MCRegUnit` type from `unsigned` to `enum class : unsigned`
and inserts necessary casts.
The added `MCRegUnitToIndex` functor is used with `SparseSet`,
`SparseMultiSet` and `IndexedMap` in a few places.

`MCRegUnit` is opaque to users, so it didn't seem worth making it a
full-fledged class like `Register`.

Static type checking has detected one issue in
`PrologueEpilogueInserter.cpp`, where `BitVector` created for
`MCRegister` is indexed by both `MCRegister` and `MCRegUnit`.

The number of casts could be reduced by using `IndexedMap` in more
places and/or adding a `BitVector` adaptor, but the number of casts *per
file* is still small and `IndexedMap` has limitations, so it didn't seem
worth the effort.

Pull Request: https://github.com/llvm/llvm-project/pull/167943
2025-11-16 20:46:44 +03:00
Sergei Barannikov
e413343ca7
[SelectionDAG] Verify SDTCisVT and SDTCVecEltisVT constraints (#150125)
Teach `SDNodeInfoEmitter` TableGen backend to process `SDTypeConstraint`
records and emit tables for them. The tables are used by
`SDNodeInfo::verifyNode()` to validate a node being created.

This PR only adds validation code for `SDTCisVT` and `SDTCVecEltisVT`
constraints to keep it smaller.

Pull Request: https://github.com/llvm/llvm-project/pull/150125
2025-11-16 18:26:03 +03:00
AZero13
d831f8df52
[SelectionDAG] Fix AArch64 machine verifier bug when expanding LOOP_DEPENDENCE_MASK (#168221)
TargetConstant nodes don't match TableGen ImmLeaf patterns during
instruction selection. When this zero constant flows into the AArch64
CCMP formation code, the machine verifier hits an assertion in expensive
checks.

Fixes: #168227
2025-11-15 21:12:11 +00:00
Austin
700aa5e376
[revert][CodeGen] add a command to force global merge (#168230)
sorry, this was my mistake
2025-11-16 03:40:07 +08:00
Austin
3705921f60 [CodeGen] add a command to force global merge
I found that in some performance scenarios, such as under O2, this pr can be helpful for a series of loading global variables.
2025-11-16 03:20:27 +08:00
Matt Arsenault
70349c17d3
DAG: Use poison in SplitVecRes_VP_LOAD_FF (#167753) 2025-11-15 08:48:36 -08:00
Matt Arsenault
33a7bb1f1a
DAG: Use poison when legalizing scalar_to_vector results (#167751) 2025-11-15 08:47:08 -08:00
Ryan Cowan
f8d65fd874
[AArch64][GlobalISel] Improve lowering of vector fp16 fpext (#165554)
This PR improves the lowering of vectors of fp16 when using fpext.

Previously vectors of fp16 were scalarized leading to lots of extra
instructions. Now, vectors of fp16 will be lowered when extended to fp64
via the preexisting lowering logic for extends. To make use of the
existing logic, we need to add elements until we reach the next power of
2.
2025-11-14 20:52:51 -08:00
Mikołaj Piróg
e7b41df10e
[SelectionDAGBuilder] Propagate fast-math flags to fpext (#167574)
As in title. Without this, fpext behaves in selectionDAG as always
having no fast-math flags.
2025-11-14 20:50:59 -08:00
Craig Topper
5442aa1853
[RDF] Rename RegisterId field in RegisterRef Reg->Id. NFC (#168154)
Not all RegisterId values are registers, so Id is a more appropriate
name.

Use asMCReg() in some places that assumed it was a register.
2025-11-14 18:33:50 -08:00
Sergei Barannikov
4eea157301
[GlobalISel] Return byte offsets from computeValueLLTs (NFC) (#166747)
To avoid scaling offsets back and forth. This is also what SelectionDAG
equivalent (ComputeValueVTs) does, and will allow to reuse
ComputeValueTypes with less effort.
2025-11-15 00:23:26 +00:00
Matt Arsenault
862d34666f
opt: Fix bad merge of #167996 (#168110)
After the base branch was moved to main, this somehow ended up
adding a second definition of RTLCI, instead of modifying the
existing one.

Also fix other build error with gcc bots.
2025-11-14 12:03:26 -08:00
Matt Arsenault
590ab43e8a
RuntimeLibcalls: Move VectorLibrary handling into TargetOptions (#167996)
This fixes the -fveclib flag getting lost on its way to the backend.

Previously this was its own cl::opt with a random boolean. Move the
flag handling into CommandFlags with other backend ABI-ish options,
and have clang directly set it, rather than forcing it to go through
command line parsing.

Prior to de68181d7f, codegen used TargetLibraryInfo to find the vector
function. Clang has special handling for TargetLibraryInfo, where it
would
directly construct one with the vector library in the pass pipeline.
RuntimeLibcallsInfo currently is not used as an analysis in codegen, and
needs to know the vector library when constructed.

RuntimeLibraryAnalysis could follow the same trick that
TargetLibraryInfo is using in the future, but a lot more boilerplate changes 
are needed to thread that analysis through codegen. Ideally this would come 
from an IR module flag, and nothing would be in TargetOptions. For now, it's 
better for all of these sorts of controls to be consistent.
2025-11-14 11:19:21 -08:00
Craig Topper
7108b12f6b
[RDF] RegisterRef/RegisterId improvements. NFC (#168030)
RegisterId can represent a physical register, a MCRegUnit, or
an index into a side structure that stores register masks. These 3
types were encoded by using the physical reg, stack slot, and
virtual register encoding partitions from the Register class.
This encoding scheme alias wasn't well contained so
Register::index2StackSlot and Register::stackSlotIndex appeared
in multiple places.

This patch gives RegisterRef its own encoding defines and separates
it from Register.

I've removed the generic idx() method in favor of getAsMCReg(),
getAsMCRegUnit(), and getMaskIdx() for some degree of type safety.

Some places used the RegisterId field of RegisterRef directly as a
register. Those have been updated to use getAsMCReg.
    
Some special cases for RegisterId 0 have been removed as it can
be treated like a MCRegister by existing code.
    
I think I want to rename the Reg field of RegisterRef to Id, but
I'll do that in another patch.
    
Additionally, callers of the RegisterRef constructor need to be
audited for implicit conversions from Register/MCRegister
to unsigned.
2025-11-14 10:30:25 -08:00
Pierre van Houtryve
31b7f1fa0b
[GlobalISel] Add support for value/constants as inline asm memory operand (#161501)
InlineAsmLowering rejected inline assembly with memory reference inputs
if the values passed to the inline asm weren't pointers. The DAG
lowering however handled them just fine.

This patch updates InlineAsmLowering to store such values on the stack,
and then use the stack pointer as the "indirect" version of the operand.
2025-11-14 10:34:38 +01:00
Craig Topper
388ef61250
[RegAllocGreedy] Use MCRegister instead of MCPhysReg. NFC (#167974) 2025-11-13 23:26:35 +00:00
Sergei Barannikov
12edc56f2b
[RegAllocFast] Add helper methods for getting/setting regunit state(NFC) (#167931)
The methods will help reduce the number of static_casts after changing
MCRegUnit to a strong typedef.
2025-11-13 19:34:37 +00:00
Sergei Barannikov
0b5f38894a
[CodeGen] Use VirtRegOrUnit/MCRegUnit in MachineTraceMetrics (NFC) (#167859) 2025-11-13 19:10:41 +00:00
Matt Arsenault
e5f499f48f
DAG: Allow select ptr combine for non-0 address spaces (#167909) 2025-11-13 18:58:08 +00:00
Sergei Barannikov
d1cc1376a0
[CodeGen] Add TRI::regunits() iterating over all register units (NFC) (#167901) 2025-11-13 17:27:35 +00:00
Craig Topper
8d6a1def4d
[SelectionDAGISel] Don't merge input chains if it would put a token factor in the way of a glue. (#167805)
In the new test, we're trying to fold a load and a X86ISD::CALL. The
call has a CopyToReg glued to it. The load and the call have different
input chains so they need to be merged. This results in a TokenFactor
that gets put between the CopyToReg and the final CALLm instruction. The
DAG scheduler can't handle that.

The load here was created by legalization of the extract_element using a
stack temporary store and load. A normal IR load would be chained into
call sequence by SelectionDAGBuilder. This would usually have the load
chained in before the CopyToReg. The store/load created by legalization
don't get chained into the rest of the DAG.

Fixes #63790
2025-11-13 09:25:53 -08:00
Sergei Barannikov
98f9b54376
[CodeGen] Hide SparseSet<LiveRegUnit> behind a typedef (NFC) (#167898)
So that changing the type of the container (planned in a future patch)
is less intrusive.
2025-11-13 20:14:23 +03:00
Craig Topper
0acdbd5d81
[InstrRef] Consistently use MLocTracker::getLocID() before calling lookupOrTrackRegister (#167841)
The LocID for registers is just the register ID. The getLocID function
is supposed to hide this detail, but it wasn't being used consistently.

This avoids a bunch of implicit casts from Register or MCRegister to
unsigned.
2025-11-13 08:38:35 -08:00
Tomer Shafir
35ffe10349
[opt] Add --save-stats option (#167304)
This patch adds a Clang-compatible --save-stats option to opt, to
provide an easy to use way to save LLVM statistics files when working
with opt on the middle end.

This is a follow up on the addition to `llc`:
https://github.com/llvm/llvm-project/pull/163967

Like on Clang, one can specify --save-stats, --save-stats=cwd, and
--save-stats=obj with the same semantics and JSON format. The
pre-existing --stats option is not affected.

The implementation extracts the flag and its methods into the common
`CodeGen/CommandFlags` as `LLVM_ABI`, using a new registration class to
conservatively enable opt-in rather than let all tools take it. Its only
needed for llc and opt for now. Then it refactors llc and adds support
for opt.
2025-11-13 16:03:28 +02:00
Simon Pilgrim
a5342d5fe5
Revert "[DAG] Fold (umin (sub a b) a) -> (usubo a b); (select usubo.1 a usubo.0)" (#167854)
Reverts llvm/llvm-project#161651 due to downstream bad codegen reports
2025-11-13 10:46:38 +00:00
Sergei Barannikov
ef9a02ce02
[CodeGen] Use VirtRegOrUnit where appropriate (NFCI) (#167730)
Use it in `printVRegOrUnit()`, `getPressureSets()`/`PSetIterator`,
and in functions/classes dealing with register pressure.

Static type checking revealed several bugs, mainly in MachinePipeliner.
I'm not very familiar with this pass, so I left a bunch of FIXMEs.

There is one bug in `findUseBetween()` in RegisterPressure.cpp, also
annotated with a FIXME.
2025-11-13 10:26:58 +00:00
Craig Topper
99a726ea51
[SelectionDAGISel] Const correct ChainNodesMatched argument to HandleMergeInputChains. NFC (#167807) 2025-11-12 22:56:57 -08:00
Matt Arsenault
2e489f77ba
CodeGen: Fix CodeView crashes with empty llvm.dbg.cu (#163286) 2025-11-12 14:59:42 -08:00