14192 Commits

Author SHA1 Message Date
Fabian Ritter
8adcc8a669
[SelectionDAG] Introduce ISD::PTRADD (#140017)
This opcode represents the addition of a pointer value (first operand)
and an integer offset (second operand). PTRADD nodes are only generated
if the TargetMachine opts in by overriding
TargetMachine::shouldPreservePtrArith().

The PTRADD node and respective visitPTRADD() function were adapted by
@rgwott from the CHERI/Morello LLVM tree.
Original authors: @davidchisnall, @jrtc27, @arichardson.

The changes in this PR were extracted from PR #105669.

---------

Co-authored-by: David Chisnall <github@theravensnest.org>
Co-authored-by: Jessica Clarke <jrtc27@jrtc27.com>
Co-authored-by: Alexander Richardson <alexrichardson@google.com>
Co-authored-by: Rodolfo Wottrich <rodolfo.wottrich@arm.com>
2025-05-28 09:09:17 +02:00
Kerry McLaughlin
b61144bf77
[AArch64] Allow lowering of more types to GET_ACTIVE_LANE_MASK (#140062)
Adds support for operand promotion and splitting/widening the result
of the ISD::GET_ACTIVE_LANE_MASK node.
For AArch64, shouldExpandGetActiveLaneMask now returns false for more
types which we know can be legalised.
2025-05-27 11:21:57 +01:00
Jon Roelofs
714096c132
[LLVM] Skip dumping inline SDag children (#141359)
If they're simple enough to render inline, we don't need to dump them
again in the recursive walk.
2025-05-26 19:40:01 -07:00
Luke Lau
3033f202f6
[IR] Add llvm.vector.[de]interleave{4,6,8} (#139893)
This adds [de]interleave intrinsics for factors of 4,6,8, so that every
interleaved memory operation supported by the in-tree targets can be
represented by a single intrinsic.

For context, [de]interleaves of fixed-length vectors are represented by
a series of shufflevectors. The intrinsics are needed for scalable
vectors, and we don't currently scalably vectorize all possible factors
of interleave groups supported by RISC-V/AArch64.

The underlying reason for this is that higher factors are currently
represented by interleaving multiple interleaves themselves, which made
sense at the time in the discussion in
https://github.com/llvm/llvm-project/pull/89018.

But after trying to integrate these for higher factors on RISC-V I think
we should revisit this design choice:

- Matching these in InterleavedAccessPass is non-trivial: We currently
only support factors that are a power of 2, and detecting this requires
a good chunk of code
- The shufflevector masks used for [de]interleaves of fixed-length
vectors are much easier to pattern match as they are strided patterns,
but for the intrinsics it's much more complicated to match as the
structure is a tree.
- Unlike shufflevectors, there's no optimisation that happens on
[de]interleave2 intriniscs
- For non-power-of-2 factors e.g. 6, there are multiple possible ways a
[de]interleave could be represented, see the discussion in #139373
- We already have intrinsics for 2,3,5 and 7, so by avoiding 4,6 and 8
we're not really saving much

By representing these higher factors are interleaved-interleaves, we can
in theory support arbitrarily high interleave factors. However I'm not
sure this is actually needed in practice: SVE only has instructions
for factors 2,3,4, whilst RVV only supports up to factor 8.

This patch would make it much easier to support scalable interleaved
accesses in the loop vectorizer for RISC-V for factors 3,5,6 and 7, as
the loop vectorizer and InterleavedAccessPass wouldn't need to
construct and match trees of interleaves.

For interleave factors above 8, for which there are no hardware memory
operations to match in the InterleavedAccessPass, we can still keep the
wide load + recursive interleaving in the loop vectorizer.
2025-05-26 18:45:12 +01:00
Jon Roelofs
346a72f2ca
[LLVM] Add color to SDNode ID's when dumping (#141295)
This is especially helpful for the recursive 'Cannot select:' dumps,
where colors help distinguish nodes at a quick glance.
2025-05-24 09:40:29 -07:00
Kazu Hirata
3bc174ba77
[CodeGen] Remove unused includes (NFC) (#141320)
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
2025-05-24 00:00:00 -07:00
Tim Gymnich
760bf4f116
[GISel] Add KnownFPClass Analysis to GISelValueTrackingPass (#134611)
- add KnownFPClass analysis to GISelValueTrackingPass
- add MI pattern for `m_GIsFPClass`
2025-05-23 14:38:51 +02:00
Craig Topper
c432936b05
[SelectionDAG][RISCV] Use VP_STORE to widen MSTORE in type legalization when possible. (#140991)
Widening the mask and padding with zeros doesn't work for scalable
vectors. Using VL produces less code for fixed vectors.

Similar was recently done for MLOAD.
2025-05-22 08:28:42 -07:00
Rahul Joshi
1fdf02ad5a
[LLVM][CodeGen] Add convenience accessors for MachineFunctionProperties (#140002)
Add per-property has<Prop>/set<Prop>/reset<Prop> functions to
MachineFunctionProperties.
2025-05-22 08:07:52 -07:00
Pierre van Houtryve
b5e2a236b9
[CodeGen] Add SSID & Atomic Ordering to IntrinsicInfo (#140896)
getTgtMemIntrinsic should be able to propagate such information to the
MMO
2025-05-22 11:42:01 +02:00
Craig Topper
0a42db682a [SelectionDAG] Simplify creation of getStoreVP in WidenVecOp_STORE. NFC
We can use the offset from the original store instead of creating
a new undef offset.

We didn't check if the offset was undef already so we really shouldn't
drop it if it isn't.
2025-05-21 17:46:08 -07:00
Craig Topper
60ad6e3fa4
[SelectionDAG][RISCV] Use VP_LOAD to widen MLOAD in type legalization when possible. (#140595)
Padding the mask using 0 elements doesn't work for scalable vectors. Use
VP_LOAD and change the VL instead.

This fixes crash for Zve32x. Test file was split since i64 isn't a valid
element type for Zve32x.

Fixes #140198.
2025-05-21 15:52:08 -07:00
Craig Topper
ee4002da2b [TargetLowering] Use getExtractSubvector/getExtractVectorElt. NFC 2025-05-21 12:06:54 -07:00
Paul Walker
5dfaf8418d
[LLVM][AArch64] Correctly lower funnel shifts by constants. (#140058)
Prevent LowerFunnelShift from creating an invalid ISD::FSHR when
lowering "ISD::FSHL X, Y, 0". Such inputs are rare because it's a NOP
that DAGCombiner will optimise away. However, we should not rely on this
and so this PR mirrors the same optimisation.
    
Ensure LowerFunnelShift normalises constant shift amounts because isel
rules expect them to be in the range [0, src bit length).
    
NOTE: To simiplify testing, this PR also adds a command line option to
disable the DAG combiner (-combiner-disabled).
2025-05-20 11:15:21 +01:00
Benjamin Maxwell
c9d6249198
[SDAG] Ensure load is included in output chain of sincos expansion (#140525)
The load not being included in the chain meant that it could materialize
after a `@llvm.lifetime.end` annotation on the pointer. This could
result in miscompiles if the stack slot is reused for another value.

Fixes https://github.com/llvm/llvm-project/issues/140491
2025-05-20 10:43:50 +01:00
David Green
b95ad8eca6
[DAGCombine] Use isLegalExtLoad for MatchLoadCombine (#140536)
This looks wrong to me, but I don't have a test case where it alters the
generated code.
2025-05-20 09:59:41 +01:00
Alexander Richardson
07e2ba445d
[AMDGPU] Set AS8 address width to 48 bits
Of the 128-bits of buffer descriptor only 48 bits are address bits, so
following the discussion on https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54,
the logic conclusion is to set the index width to 48 bits instead of
the current value of 128.

Most of the test changes are mechanical datalayout updates, but there
is one actual change: the ptrmask test now uses .i48 instead of .i128
and I had to update SelectionDAGBuilder to correctly extend the mask.

Reviewed By: krzysz00

Pull Request: https://github.com/llvm/llvm-project/pull/139419
2025-05-19 17:26:05 -07:00
Liam Semeria
d067014f13
[APInt] Added APInt::clearBits() method (#137098)
Added APInt::clearBits(unsigned loBit, unsigned hiBit) that clears bits within a certain range.

Fixes #136550

---------

Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-05-19 12:41:04 +01:00
Piotr Fusik
9e22f9611a
[DAGCombiner] Fix a "subtraction if above a constant threshold" miscompile (#140042)
This fixes #135194 incorrectly reusing the existing `add nuw/nsw`
while the transformed code relies on an unsigned wrap.
2025-05-17 12:18:52 +02:00
Craig Topper
aaaae99663 [SelectionDAG] Use getInsertSubvector/VectorElt and getExtractSubvector/VectorElt in LegalizeVectorTypes. NFC 2025-05-16 23:37:03 -07:00
Craig Topper
dcd62f3674
[SelectionDAG] Rename MemSDNode::getOriginalAlign to getBaseAlign. NFC (#139930)
This matches the underlying function in MachineMemOperand and how it is
printed when BaseAlign differs from Align.
2025-05-16 09:37:02 -07:00
Pierre van Houtryve
5e7bc5e080
[DAGCombiner] Remove hasOneUse check from sext+sext_inreg to sext_inreg combine (#140207)
The hasOneUseCheck does not really add anything and makes the combine too
restrictive. Upcoming patches benefit from removing the hasOneUse check.
2025-05-16 10:25:49 +02:00
Kazu Hirata
18ecff4f65
[llvm] Use llvm::stable_sort (NFC) (#140067) 2025-05-15 12:18:18 -07:00
Alexander Peskov
2bc9f43ba1
[DAGCombiner] Fold pattern for srl-shl-zext (REAPPLIED) (#140038)
Fold (srl (lop x, (shl (zext y), c1)), c1) -> (lop (srl x, c1), (zext y)) where c1 <= leadingzeros(zext(y)).

This is equivalent of existing fold chain (srl (shl (zext y), c1), c1) -> (and (zext y), mask) -> (zext y), but logical op in the middle prevents it from combining.

Profit : Allow to reduce the number of instructions.

Original commit: #138290 / bbc5221

Previously reverted due to conflict in LIT test. Mainline changed
default version of load instruction to untyped version by this #137698 .
Updated test uses `ld.param.b64` instead of `ld.param.u64`.
2025-05-15 18:04:33 +01:00
Kazu Hirata
9658c55116 [SelectionDAG] Fix a warning
This patch fixes:

  llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:7506:17: error:
  unused variable 'NoFPClass' [-Werror,-Wunused-variable]
2025-05-15 07:05:33 -07:00
Kerry McLaughlin
0bc3993716
[SelectionDAG] Add an ISD node for for get.active.lane.mask (#139084)
For now expansion still happens in SelectionDAGBuilder when
GET_ACTIVE_LANE_MASK is not legal on the target.

This patch also includes changes in AArch64ISelLowering to replace
handling of the get.active.lane.mask intrinsic to use the ISD node.
Tablegen patterns are added which match to whilelo for scalable types.

A follow up change will add support for more types to be lowered to
GET_ACTIVE_LANE_MASK by allowing splitting of the node.
2025-05-15 09:14:46 +01:00
YunQiang Su
780054d3ff
CodeGen: Add ISD::AssertNoFPClass (#138839)
It is used to mark a value that we are sure that it is not some fcType.
The examples include:

  * An arguments of a function is marked with nofpclass
  * Output value of an intrinsic can be sure to not be some type

So that the following operation can make some assumptions.
2025-05-15 16:05:15 +08:00
Simon Pilgrim
ca912c7c08
Revert bbc5221c95343d8d6869dce83d6fcf183767bd9f "[DAGCombiner] Fold pattern for srl-shl-zext" (#139876)
Reverts llvm/llvm-project#138290 due to buildbot failures in shift-opt.ll
2025-05-14 12:13:54 +01:00
Alexander Peskov
bbc5221c95
[DAGCombiner] Fold pattern for srl-shl-zext (#138290)
Fold `(srl (lop x, (shl (zext y), c1)), c1) -> (lop (srl x, c1), (zext y))` where c1 <= leadingzeros(zext(y)).

This is equivalent of existing fold chain `(srl (shl (zext y), c1), c1) -> (and (zext y), mask) -> (zext y)`, but logical op in the middle prevents it from combining.

Profit : Allow to reduce the number of instructions.

---------

Signed-off-by: Alexander Peskov <apeskov@nvidia.com>
2025-05-14 11:57:55 +01:00
AZero13
af6261b50b
[DAG] visitINSERT_VECTOR_ELT - convert to or mask if all insertions are -1 (#138213)
We did this for 0 and and, but we can do this with or and -1.

Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-05-13 17:10:54 +01:00
Paul Walker
e01bdc18e3
[LLVM][SelectionDAG] Simplify SplitVecOp_VSETCC. (#139295)
Preserving the original result element type when splitting vector setcc
operations removes redundant extensions that are awkward to optimise
after the fact.
2025-05-13 12:13:01 +01:00
Matt Arsenault
2f9323bc5b
DAG: Stop forcibly adding nsz to expanded minnum/maxnum (#139615) 2025-05-13 07:37:21 +02:00
Rux124
ef40ae4f4e
[SelectionDAG] Fix incorrect fold condition in foldSetCCWithFunnelShift. (#137637)
Proposed by
[2ed1598](2ed15984b4):

`fshl X, (or X, Y), C ==/!= 0 --> or (srl Y, BW-C), X ==/!= 0`

This transformation is valid when (C%Bitwidth) != 0 , as verified by
[Alive2](https://alive2.llvm.org/ce/z/TQYM-m).

Fixes #136746
2025-05-12 13:25:07 +08:00
Kazu Hirata
50e949f3cc
[IR] Teach getAsmString to return StringRef (NFC) (#139406)
This is for consistency with #139401.
2025-05-10 22:59:09 -07:00
Philip Reames
80370465d9
[DAG] Add wrappers for insert_vector_elt and extract_vector_elt [nfc] (#139141)
As with the recently added subvector variants, provide the unsigned
index operand to simplify a bunch of code.

---------

Co-authored-by: Luke Lau <luke_lau@icloud.com>
2025-05-09 06:37:58 -07:00
Philip Reames
cf2f558501 [DAG/RISCV] Continue mitgrating to getInsertSubvector and getExtractSubvector
Follow up to 6e654caab, use the new routines in more places.  Note that
I've excluded from this patch any case which uses a getConstant index
instead of a getVectorIdxConstant index just to minimize room for
error.  I'll get those in a separate follow up.
2025-05-08 09:40:45 -07:00
Philip Reames
6e654caabe
[DAG] Add wrappers for insert and extract sub-vector [nfc] (#137230)
Mechanical change to introduce the new wrappers, and add enough users to
make the usage pattern clear. Once this lands, I'm going to do a further
pass to adjust more callsites as separate changes.

---------

Co-authored-by: Luke Lau <luke_lau@icloud.com>
2025-05-08 06:49:37 -07:00
Philip Reames
650dca5d89
[IR] Remove the AtomicMem*Inst helper classes (#138710)
Migrate their usage to the `AnyMem*Inst` family, and add a isAtomic()
query on the base class for that hierarchy. This matches the idioms we
use for e.g. isAtomic on load, store, etc.. instructions, the existing
isVolatile idioms on mem* routines, and allows us to more easily share
code between atomic and non-atomic variants.

As with #138568, the goal here is to simplify the class hierarchy and
make it easier to reason about. I'm moving from easiest to hardest, and
will stop at some point when I hit "good enough". Longer term, I'd sorta
like to merge or reverse the naming on the plain Mem*Inst and the
AnyMem*Inst, but that's a much larger and more risky change. Not sure
I'm going to actually do that.
2025-05-06 14:24:40 -07:00
Nicholas Guy
a8ed244178
[DAGCombiner] Add DAG combine for PARTIAL_REDUCE_MLA when no mul op (#131326)
Generic DAG combine for ISD::PARTIAL_REDUCE_U/SMLA to convert:
PARTIAL_REDUCE_*MLA(Acc, ZEXT(UnextOp1), Splat(1)) into
PARTIAL_REDUCE_UMLA(Acc, UnextOp1, TRUNC(Splat(1))) and
PARTIAL_REDUCE_*MLA(Acc, SEXT(UnextOp1), Splat(1)) into
PARTIAL_REDUCE_SMLA(Acc, UnextOp1, TRUNC(Splat(1))).

---------

Co-authored-by: James Chesterman <james.chesterman@arm.com>
2025-05-06 16:54:39 +01:00
Philip Reames
d1b3eeb244
[SDAG] Merge memcpy and memcpy.inline lowering paths (#138619)
This is a follow up to c0a264e, but note that there is a functional
difference here: the root changes for the memcpy.inline case. This
difference appears to have been accidental, but I kept this back to
facility separate review in case there's something I'm missing here.
2025-05-06 07:37:44 -07:00
Sander de Smalen
d90cac9641
[DAGCombine] Simplify partial_reduce_*mla with constant. (#138289)
partial_reduce_*mla(acc, mul(ext(x), splat(C)), splat(1))
-> partial_reduce_*mla(acc, x, C)
2025-05-06 13:51:52 +01:00
Simon Pilgrim
bde39d7251
[DAG] Add SDPatternMatch::m_BitwiseLogic common matcher for AND/OR/XOR nodes (#138301) 2025-05-06 12:50:50 +01:00
Philip Reames
c0a264e6a9
[IntrinsicInst] Remove MemCpyInlineInst and MemSetInlineInst [nfc] (#138568)
I'm looking for ways to simplify the Mem*Inst class structure, and these
two seem to have fairly minimal justification, so let's remove them.
2025-05-05 14:07:31 -07:00
Kazu Hirata
cdc9a4b5f8
[CodeGen] Use range-based for loops (NFC) (#138488)
This is a reland of #138434 except that:

- the bits for llvm/lib/CodeGen/RenameIndependentSubregs.cpp
  have been dropped because they caused a test failure under asan, and

- the bits for llvm/lib/CodeGen/SelectionDAG/ScheduleDAGFast.cpp have
  been improved with structured bindings.
2025-05-05 10:08:49 -07:00
Kazu Hirata
f81193ddfd
[SelectionDAG] Remove obsolete comments (NFC) (#138483)
These functions do not return boolean values.
2025-05-05 10:08:19 -07:00
Kazu Hirata
aa15596b5f
[llvm] Remove unused local variables (NFC) (#138478) 2025-05-04 21:33:54 -07:00
Nico Weber
1d955489c3 Revert "[CodeGen] Use range-based for loops (NFC) (#138434)"
This reverts commit a9699a334bc9666570418a3bed9520bcdc21518b.

Breaks CodeGen/AMDGPU/collapse-endcf.ll in several configs
(sanitizer builds; macOS; possibly more), see comments on
https://github.com/llvm/llvm-project/pull/138434
2025-05-04 17:36:52 -04:00
Kazu Hirata
c51a3aa6ce
[llvm] Remove unused local variables (NFC) (#138467) 2025-05-04 13:05:18 -07:00
Kazu Hirata
47f391fd0e
[CodeGen] Remove unused local variables (NFC) (#138441) 2025-05-04 00:26:37 -07:00
Kazu Hirata
a9699a334b
[CodeGen] Use range-based for loops (NFC) (#138434) 2025-05-04 00:26:19 -07:00