2182 Commits

Author SHA1 Message Date
Daniel Thornburgh
75c4d315dc Take symbol name by GlobalValue again to avoid modifying Module 2025-08-21 14:28:08 -07:00
Daniel Thornburgh
5a056672cb Take symbol name by metadata arg rather than ptr to GlobalValue 2025-08-18 11:37:40 -07:00
Daniel Thornburgh
f3fdeff3e3 [IR] llvm.reloc.none intrinsic for no-op symbol references
This intrinsic emits a BFD_RELOC_NONE relocation at the point of call,
which allows optimizations and languages to explicitly pull in symbols
from static libraries without there being any code or data that has an
effectual relocation against such a symbol.

See issue #146159 for context.
2025-08-18 11:37:40 -07:00
Kazu Hirata
07eb7b7692
[llvm] Replace SmallSet with SmallPtrSet (NFC) (#154068)
This patch replaces SmallSet<T *, N> with SmallPtrSet<T *, N>.  Note
that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer
element types:

  template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};

We only have 140 instances that rely on this "redirection", with the
vast majority of them under llvm/. Since relying on the redirection
doesn't improve readability, this patch replaces SmallSet with
SmallPtrSet for pointer element types.
2025-08-18 07:01:29 -07:00
Matt Arsenault
53e9d3247e
DAG: Remove unnecessary getPointerTy call (#154055)
getValueType already did this
2025-08-18 17:12:16 +09:00
Nikita Popov
238c3dcd0d
[CodeGen][Mips] Remove fp128 libcall list (#153798)
Mips requires fp128 args/returns to be passed differently than i128. It
handles this by inspecting the pre-legalization type. However, for soft
float libcalls, the original type is currently not provided (it will
look like a i128 call). To work around that, MIPS maintains a list of
libcalls working on fp128.

This patch removes that list by providing the original, pre-softening
type to calling convention lowering. This is done by carrying additional
information in CallLoweringInfo, as we unfortunately do need both types
(we want the un-softened type for OrigTy, but we need the softened type
for the actual register assignment etc.)

This is in preparation for completely removing all the custom
pre-analysis code in the Mips backend and replacing it with use of
OrigTy.
2025-08-18 09:22:41 +02:00
Nikita Popov
01bc742185
[CodeGen] Give ArgListEntry a proper constructor (NFC) (#153817)
This ensures that the required fields are set, and also makes the
construction more convenient.
2025-08-15 18:06:07 +02:00
Nikita Popov
11c2240049 [SDAGBuilder] Rename RetTys -> RetVTs (NFC)
Make it clearer that this is a vector of EVTs, not IR types.

Based on:
https://github.com/llvm/llvm-project/pull/153798#discussion_r2279066696
2025-08-15 17:06:33 +02:00
Philip Reames
606937474e
[SDAG] Remove IndexType manipulation in getUniformBase and callers (#151578)
All paths set it to the same value, just propagate that value to the
consumer.
2025-08-15 08:00:47 -07:00
Diana Picus
ac005e16f6
Reapply "[AMDGPU] Intrinsic for launching whole wave functions" (#153584)
This reverts commit 14cd1339318b16e08c1363ec6896bd7d1e4ae281. The
buildbot failure seems to have been a cmake issue which has been
discussed in more detail in this Discourse post:

https://discourse.llvm.org/t/cmake-doesnt-regenerate-all-tablegen-target-files/87901

If any buildbots fail to select arbitrary intrinsics with this patch,
it's worth considering using clean builds with ccache instead of
incremental builds, as recommended here:

https://llvm.org/docs/HowToAddABuilder.html#:~:text=Use%20CCache%20and%20NOT%20incremental%20builds

The original commit message for this patch:
Add the llvm.amdgcn.call.whole.wave intrinsic for calling whole wave
functions. This will take as its first argument the callee with the
amdgpu_gfx_whole_wave calling convention, followed by the call
parameters which must match the signature of the callee except for the
first function argument (the i1 original EXEC mask, which doesn't need
to be passed in). Indirect calls are not allowed.

Make direct calls to amdgpu_gfx_whole_wave functions a verifier error.

Tail calls are handled in a future patch.
2025-08-15 10:12:47 +02:00
Nikita Popov
16ad20291d [TargetLowering] Store Context in variable (NFC)
Avoid repeating CLI.RetTy->getContext() many times.
2025-08-14 17:13:31 +02:00
Nikita Popov
498ef361fe
[CodeGen] Make OrigTy in CC lowering the non-aggregate type (#153414)
https://github.com/llvm/llvm-project/pull/152709 exposed the original IR
argument type to the CC lowering logic. However, in SDAG, this used the
raw type, prior to aggregate splitting. This PR changes it to use the
non-aggregate type instead. (This matches what happened in the
GlobalISel case already.)

I've also added some more detailed documentation on the
InputArg/OutputArg fields, to explain how they differ. In most cases
ArgVT is going to be the EVT of OrigTy, so they encode very similar
information (OrigTy just preserves some additional information lost in
EVTs, like pointer types). One case where they do differ is in
post-legalization lowering of libcalls, where ArgVT is going to be a
legalized type, while OrigTy is going to be the original non-legalized
type.
2025-08-13 18:42:26 +02:00
Nikita Popov
240c454c4d
[CodeGen] Remove default ctors for InputArg and OutputArg (#153205)
These make it easy to forget to initialize some members, like the newly
added OrigTy. Force these to always go through the ctor instead.
2025-08-13 10:51:43 +02:00
XChy
2a49719525
[SelectionDAGBuilder] Look for appropriate INLINEASM_BR instruction to verify (#152591)
Partially fix #149023.
The original code `MRI.def_begin(Reg)->getParent()` may return the
incorrect MI, as the physical register `Reg` may have multiple
definitions.
This patch selects the correct MI to verify by comparing the MBB of each
definition.

New testcase hangs with -O1/2/3 enabled. The BranchFolding may be to
blame.
2025-08-12 12:37:56 +00:00
Nikita Popov
e92b7e9641
[CodeGen] Provide original IR type to CC lowering (NFC) (#152709)
It is common to have ABI requirements for illegal types: For example,
two i64 argument parts that originally came from an fp128 argument may
have a different call ABI than ones that came from a i128 argument.

The current calling convention lowering does not provide access to this
information, so backends come up with various hacks to support it (like
additional pre-analysis cached in CCState, or bypassing the default
logic entirely).

This PR adds the original IR type to InputArg/OutputArg and passes it
down to CCAssignFn. It is not actually used anywhere yet, this just does
the mechanical changes to thread through the new argument.
2025-08-11 08:57:53 +02:00
Alexander Richardson
3a4b351ba1
[IR] Introduce the ptrtoaddr instruction
This introduces a new `ptrtoaddr` instruction which is similar to
`ptrtoint` but has two differences:

1) Unlike `ptrtoint`, `ptrtoaddr` does not capture provenance
2) `ptrtoaddr` only extracts (and then extends/truncates) the low
   index-width bits of the pointer

For most architectures, difference 2) does not matter since index (address)
width and pointer representation width are the same, but this does make a
difference for architectures that have pointers that aren't just plain
integer addresses such as AMDGPU fat pointers or CHERI capabilities.

This commit introduces textual and bitcode IR support as well as basic code
generation, but optimization passes do not handle the new instruction yet
so it may result in worse code than using ptrtoint. Follow-up changes will
update capture tracking, etc. for the new instruction.

RFC: https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54

Reviewed By: nikic

Pull Request: https://github.com/llvm/llvm-project/pull/139357
2025-08-08 10:12:39 -07:00
Nikita Popov
c23b4fbdbb
[IR] Remove size argument from lifetime intrinsics (#150248)
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.

This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
2025-08-08 11:09:34 +02:00
zhijian lin
093439c688
[PowerPC][AIX] Using milicode for memcmp instead of libcall (#147093)
AIX has "millicode" routines, which are functions loaded at boot time
into fixed addresses in kernel memory. This allows them to be customized
for the processor. The __memcmp routine is a millicode implementation;
we use millicode for the memcmp function instead of a library call to
improve performance.
2025-08-07 13:13:56 -04:00
Nikita Popov
406d9b1dd6
[CodeGen] Move IsFixed into ArgFlags (NFCI) (#152319)
The information whether a specific argument is vararg or fixed is
currently stored separately from all the other argument information in
ArgFlags. This means that it is not accessible from CCAssign, and
backends have developed all kinds of workarounds for how they can access
it after all.

Move this information to ArgFlags to make it directly available in all
relevant places.

I've opted to invert this and store it as IsVarArg, as I think that both
makes the meaning more obvious and provides for a better default (which
is IsVarArg=false).
2025-08-07 09:12:40 +02:00
Diana Picus
14cd133931
Revert "[AMDGPU] Intrinsic for launching whole wave functions" (#152286)
Reverts llvm/llvm-project#145859 because it broke a HIP test:
```
[34/59] Building CXX object External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o
FAILED: External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o 
/home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/llvm/bin/clang++ -DNDEBUG  -O3 -DNDEBUG   -w -Werror=date-time --rocm-path=/opt/botworker/llvm/External/hip/rocm-6.3.0 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -xhip -mfma -MD -MT External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o -MF External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o.d -o External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o -c /home/botworker/bbot/clang-hip-vega20/llvm-test-suite/External/HIP/workload/ray-tracing/TheNextWeek/main.cc
fatal error: error in backend: Cannot select: intrinsic %llvm.amdgcn.readfirstlane
```
2025-08-06 12:24:52 +02:00
Diana Picus
0461cd3d1d
[AMDGPU] Intrinsic for launching whole wave functions (#145859)
Add the llvm.amdgcn.call.whole.wave intrinsic for calling whole wave
functions. This will take as its first argument the callee with the
amdgpu_gfx_whole_wave calling convention, followed by the call
parameters which must match the signature of the callee except for the
first function argument (the i1 original EXEC mask, which doesn't need
to be passed in). Indirect calls are not allowed.

Make direct calls to amdgpu_gfx_whole_wave functions a verifier error.

Unspeakable horrors happen around calls from whole wave functions, the
plan is to improve the handling of caller/callee-saved registers in
a future patch.

Tail calls are also handled in a future patch.
2025-08-06 10:25:53 +02:00
Craig Topper
73685583c8
[VP][RISCV] Add a vp.load.ff intrinsic for fault only first load. (#128593)
There's been some interest in supporting early-exit loops recently.
https://discourse.llvm.org/t/rfc-supporting-more-early-exit-loops/84690

This patch was extracted from our downstream where we've been using it
in our vectorizer.
2025-08-05 16:12:42 -07:00
Nikita Popov
86727fe9a1
[IR] Allow poison argument to lifetime markers (#151148)
This slightly relaxes the invariant established in #149310, by also
allowing the lifetime argument to be poison. This is to support the
typical pattern of RAUWing with poison when removing an instruction.

It's worth noting that this does not require any conservative
assumptions, lifetimes with poison arguments can simply be skipped.

Fixes https://github.com/llvm/llvm-project/issues/151119.
2025-08-04 10:02:04 +02:00
Nikita Popov
ab1f6ce482
[IR][SDAG] Remove lifetime size handling from SDAG (#150944)
Split out from https://github.com/llvm/llvm-project/pull/150248:

Specify that the argument of lifetime.start/lifetime.end is ignored and
will be removed in the future.

Remove lifetime size handling from SDAG. The size was previously
discarded during isel, so was always ignored for stack coloring anyway.
Where necessary, obtain the size of the full frame index.
2025-07-29 09:53:59 +02:00
paperchalice
21836f4a49
[SelectionDAG] Remove UnsafeFPMath in LegalizeDAG (#146316)
These global flags hinder further improvements like [[RFC] Honor pragmas
with
-ffp-contract=fast](https://discourse.llvm.org/t/rfc-honor-pragmas-with-ffp-contract-fast)
and pass concurrency support. Remove them incrementally.
2025-07-29 08:41:21 +08:00
Nikita Popov
a7a1df8f72
[CodeGen] Remove handling for lifetime.start/end on non-alloca (#149838)
After https://github.com/llvm/llvm-project/pull/149310 we are guaranteed
that the argument is an alloca, so we don't need to look at underlying
objects (which was not a correct thing to do anyway).

This also drops the offset argument for lifetime nodes in SDAG. The
offset is fixed to zero now. (Peculiarly, while SDAG pretended to have
an offset, it just gets silently dropped during selection.)
2025-07-22 09:44:59 +02:00
Fraser Cormack
284dd5ba84
[SelectionDAG] Fix misplaced commas in operand bundle errors (#149331) 2025-07-17 21:18:05 +01:00
Florian Mayer
5458151817
[SelectionDAG] improve error messages for invalid operator bundle (#148945) 2025-07-15 17:30:48 -07:00
Florian Mayer
be200e2b80
[SelectionDAG] improve error message for invalid op bundles (#148722) 2025-07-14 20:41:10 -07:00
Florian Mayer
14dc3e3d5f
[SelectionDAG] [KCFI] Allow "kcfi" on invoke (#148742)
This is handled in CallBase, so it is valid for both call and invoke
2025-07-14 18:55:09 -07:00
Philip Reames
220a002396
[SDAG] Prefer scalar for prefix of vector GEP expansion (#146719)
When generating SDAG for a getelementptr with a vector result, we were
previously generating splats for each scalar operand. This essentially
has the effect of aggressively vectorizing the sequence, and leaving it
later combines to scalarize if profitable.

Instead, we can keep the accumulating address as a scalar for as long as
the prefix of operands allows before lazily converting to vector on the
first vector operand. This both better fits hardware which frequently
has a scalar base on the scatter/gather instructions, and reduces the
addressing cost even when not as otherwise we end up with a scalar to
vector domain crossing for each scalar operand.

Note that constant splat offsets are treated as scalar for the above,
and only variable offsets can force a conversion to vector.

---------

Co-authored-by: Craig Topper <craig.topper@sifive.com>
2025-07-02 18:16:27 -07:00
Matt Arsenault
f0d898f36b
DAG: Move get_dynamic_area_offset type check to IR verifier (#145268)
Also fix the LangRef to match the implementation. This was checking
against the alloca address space size rather than the default address
space.

The check was also more permissive than the LangRef. The error
check permitted any size less than the pointer size; follow the
stricter wording of the LangRef.
2025-06-24 11:11:52 +09:00
Orlando Cazalet-Hyams
36038a1048
[RemoveDIs][NFC] Remove dbg intrinsic handling code from SelectionDAG ISel (#144702) 2025-06-18 16:04:18 +01:00
Jeremy Morse
9eb0020555
[DebugInfo][RemoveDIs] Remove a swathe of debug-intrinsic code (#144389)
Seeing how we can't generate any debug intrinsics any more: delete a
variety of codepaths where they're handled. For the most part these are
plain deletions, in others I've tweaked comments to remain coherent, or
added a type to (what was) type-generic-lambdas.

This isn't all the DbgInfoIntrinsic call sites but it's most of the
simple scenarios.

Co-authored-by: Nikita Popov <github@npopov.com>
2025-06-17 15:55:14 +01:00
Paul Walker
465e3ce9f1
[LLVM][CodeGen] Lower ConstantInt vectors like shufflevector base splats. (#144395)
ConstantInt vectors utilise DAG.getConstant() when constructing the
initial DAG. This can have the effect of legalising the constant before
the DAG combiner is run, significant altering the generated code. To
mitigate this (hopefully as a temporary measure) we instead try to
construct the DAG in the same way as shufflevector based splats.
2025-06-17 11:09:22 +01:00
Omair Javaid
e1e1836bbd
[CodeGen] Inline stack guard check on Windows (#136290)
This patch optimizes the Windows security cookie check mechanism by
moving the comparison inline and only calling __security_check_cookie
when the check fails. This reduces the overhead of making a DLL call 
for every function return.

Previously, we implemented this optimization through a machine pass
(X86WinFixupBufferSecurityCheckPass) in PR #95904 submitted by
@mahesh-attarde. We have reverted that pass in favor of this new 
approach. Also we have abandoned the AArch64 specific implementation 
of same pass in PR #121938 in favor of this more general solution.

The old machine instruction pass approach:
- Scanned the generated code to find __security_check_cookie calls
- Modified these calls by splitting basic blocks
- Added comparison logic and conditional branching
- Required complex block management and live register computation

The new approach:
- Implements the same optimization during instruction selection
- Directly emits the comparison and conditional branching
- No need for post-processing or basic block manipulation
- Disables optimization at -Oz.

Thanks @tamaspetz, @efriedma-quic and @arsenm for their help.
2025-06-12 19:38:42 +05:00
Abhishek Kaushik
d6ecd6a658
[SelectionDAG][X86] Handle llvm.type.test in DAGBuilder (#142939)
Closes #142937
2025-06-09 10:44:55 +05:30
Nikita Popov
b3ce9883f3
[SelectionDAG] Use reportFatalUsageError() for invalid operand bundles (#142613)
Replace the asserts with reportFatalUsageError(), as these can be
reached with invalid user-provided IR.

Fixes https://github.com/llvm/llvm-project/issues/142531.
2025-06-04 09:33:05 +02:00
Simon Tatham
56acb06bc6
[ARM,AArch64] Don't put BTI at asm goto branch targets (#141562)
In 'asm goto' statements ('callbr' in LLVM IR), you can specify one or
more labels / basic blocks in the containing function which the assembly
code might jump to. If you're also compiling with branch target
enforcement via BTI, then previously listing a basic block as a possible
jump destination of an asm goto would cause a BTI instruction to be
placed at the start of the block, in case the assembly code used an
_indirect_ branch instruction (i.e. to a destination address read from a
register) to jump to that location. Now it doesn't do that any more:
branches to destination labels from the assembly code are assumed to be
direct branches (to a relative offset encoded in the instruction), which
don't require a BTI at their destination.

This change was proposed in https://discourse.llvm.org/t/85845 and there
seemed to be no disagreement. The rationale is:

1. it brings clang's handling of asm goto in Arm and AArch64 in line
with gcc's, which didn't generate BTIs at the target labels in the first
place.

2. it improves performance in the Linux kernel, which uses a lot of 'asm
goto' in which the assembly language just contains a NOP, and the
label's address is saved elsewhere to let the kernel self-modify at run
time to swap between the original NOP and a direct branch to the label.
This allows hot code paths to be instrumented for debugging, at only the
cost of a NOP when the instrumentation is turned off, instead of the
larger cost of an indirect branch. In this situation a BTI is
unnecessary (if the branch happens it's direct), and since the code
paths are hot, also a noticeable performance hit.

Implementation:

`SelectionDAGBuilder::visitCallBr` is the place where 'asm goto' target
labels are handled. It calls `setIsInlineAsmBrIndirectTarget()` on each
target `MachineBasicBlock`. Previously it also called
`setMachineBlockAddressTaken()`, which made `hasAddressTaken()` return
true, which caused a BTI to be added in the Arm backends.

Now `visitCallBr` doesn't call `setMachineBlockAddressTaken()` any more
on asm goto targets, but `hasAddressTaken()` also checks the flag set by
`setIsInlineAsmBrIndirectTarget()`. So call sites that were using
`hasAddressTaken()` don't need to be modified. But the Arm backends
don't call `hasAddressTaken()` any more: instead they test two more
specific query functions that cover all the reasons `hasAddressTaken()`
might have returned true _except_ being an asm goto target.

Testing:

The new test `AArch64/callbr-asm-label-bti.ll` is testing the actual
change, where it expects not to see a `bti` instruction after
`[[LABEL]]`. The rest of the test changes are all churn, due to the
flags on basic blocks changing. Actual output code hasn't changed in any
of the existing tests, only comments and diagnostics.

Further work:

`RISCVIndirectBranchTracking.cpp` and `X86IndirectBranchTracking.cpp`
also call `hasAddressTaken()` in a way that might benefit from using the
same more specific check I've put in `ARMBranchTargets.cpp` and
`AArch64BranchTargets.cpp`. But I'm not sure of that, so in this commit
I've only changed the Arm backends, and left those alone.
2025-06-03 08:44:13 +01:00
Nikita Popov
ea096c98ae
[SDAG] Remove noundef workaround for range metadata/attributes (#141745)
In https://reviews.llvm.org/D157685 I changed SDAG to only transfer
range metadata to SDAG if it also has !noundef. At the time, this was
necessary because SDAG incorrectly propagated poison when folding
logical and/or to bitwise and/or.

The root cause of that issue has since been addressed by
https://github.com/llvm/llvm-project/pull/84924, so drop the workaround
now.
2025-05-30 10:56:49 +02:00
Fabian Ritter
8adcc8a669
[SelectionDAG] Introduce ISD::PTRADD (#140017)
This opcode represents the addition of a pointer value (first operand)
and an integer offset (second operand). PTRADD nodes are only generated
if the TargetMachine opts in by overriding
TargetMachine::shouldPreservePtrArith().

The PTRADD node and respective visitPTRADD() function were adapted by
@rgwott from the CHERI/Morello LLVM tree.
Original authors: @davidchisnall, @jrtc27, @arichardson.

The changes in this PR were extracted from PR #105669.

---------

Co-authored-by: David Chisnall <github@theravensnest.org>
Co-authored-by: Jessica Clarke <jrtc27@jrtc27.com>
Co-authored-by: Alexander Richardson <alexrichardson@google.com>
Co-authored-by: Rodolfo Wottrich <rodolfo.wottrich@arm.com>
2025-05-28 09:09:17 +02:00
Luke Lau
3033f202f6
[IR] Add llvm.vector.[de]interleave{4,6,8} (#139893)
This adds [de]interleave intrinsics for factors of 4,6,8, so that every
interleaved memory operation supported by the in-tree targets can be
represented by a single intrinsic.

For context, [de]interleaves of fixed-length vectors are represented by
a series of shufflevectors. The intrinsics are needed for scalable
vectors, and we don't currently scalably vectorize all possible factors
of interleave groups supported by RISC-V/AArch64.

The underlying reason for this is that higher factors are currently
represented by interleaving multiple interleaves themselves, which made
sense at the time in the discussion in
https://github.com/llvm/llvm-project/pull/89018.

But after trying to integrate these for higher factors on RISC-V I think
we should revisit this design choice:

- Matching these in InterleavedAccessPass is non-trivial: We currently
only support factors that are a power of 2, and detecting this requires
a good chunk of code
- The shufflevector masks used for [de]interleaves of fixed-length
vectors are much easier to pattern match as they are strided patterns,
but for the intrinsics it's much more complicated to match as the
structure is a tree.
- Unlike shufflevectors, there's no optimisation that happens on
[de]interleave2 intriniscs
- For non-power-of-2 factors e.g. 6, there are multiple possible ways a
[de]interleave could be represented, see the discussion in #139373
- We already have intrinsics for 2,3,5 and 7, so by avoiding 4,6 and 8
we're not really saving much

By representing these higher factors are interleaved-interleaves, we can
in theory support arbitrarily high interleave factors. However I'm not
sure this is actually needed in practice: SVE only has instructions
for factors 2,3,4, whilst RVV only supports up to factor 8.

This patch would make it much easier to support scalable interleaved
accesses in the loop vectorizer for RISC-V for factors 3,5,6 and 7, as
the loop vectorizer and InterleavedAccessPass wouldn't need to
construct and match trees of interleaves.

For interleave factors above 8, for which there are no hardware memory
operations to match in the InterleavedAccessPass, we can still keep the
wide load + recursive interleaving in the loop vectorizer.
2025-05-26 18:45:12 +01:00
Pierre van Houtryve
b5e2a236b9
[CodeGen] Add SSID & Atomic Ordering to IntrinsicInfo (#140896)
getTgtMemIntrinsic should be able to propagate such information to the
MMO
2025-05-22 11:42:01 +02:00
Alexander Richardson
07e2ba445d
[AMDGPU] Set AS8 address width to 48 bits
Of the 128-bits of buffer descriptor only 48 bits are address bits, so
following the discussion on https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54,
the logic conclusion is to set the index width to 48 bits instead of
the current value of 128.

Most of the test changes are mechanical datalayout updates, but there
is one actual change: the ptrmask test now uses .i48 instead of .i128
and I had to update SelectionDAGBuilder to correctly extend the mask.

Reviewed By: krzysz00

Pull Request: https://github.com/llvm/llvm-project/pull/139419
2025-05-19 17:26:05 -07:00
Kerry McLaughlin
0bc3993716
[SelectionDAG] Add an ISD node for for get.active.lane.mask (#139084)
For now expansion still happens in SelectionDAGBuilder when
GET_ACTIVE_LANE_MASK is not legal on the target.

This patch also includes changes in AArch64ISelLowering to replace
handling of the get.active.lane.mask intrinsic to use the ISD node.
Tablegen patterns are added which match to whilelo for scalable types.

A follow up change will add support for more types to be lowered to
GET_ACTIVE_LANE_MASK by allowing splitting of the node.
2025-05-15 09:14:46 +01:00
YunQiang Su
780054d3ff
CodeGen: Add ISD::AssertNoFPClass (#138839)
It is used to mark a value that we are sure that it is not some fcType.
The examples include:

  * An arguments of a function is marked with nofpclass
  * Output value of an intrinsic can be sure to not be some type

So that the following operation can make some assumptions.
2025-05-15 16:05:15 +08:00
Kazu Hirata
50e949f3cc
[IR] Teach getAsmString to return StringRef (NFC) (#139406)
This is for consistency with #139401.
2025-05-10 22:59:09 -07:00
Philip Reames
650dca5d89
[IR] Remove the AtomicMem*Inst helper classes (#138710)
Migrate their usage to the `AnyMem*Inst` family, and add a isAtomic()
query on the base class for that hierarchy. This matches the idioms we
use for e.g. isAtomic on load, store, etc.. instructions, the existing
isVolatile idioms on mem* routines, and allows us to more easily share
code between atomic and non-atomic variants.

As with #138568, the goal here is to simplify the class hierarchy and
make it easier to reason about. I'm moving from easiest to hardest, and
will stop at some point when I hit "good enough". Longer term, I'd sorta
like to merge or reverse the naming on the plain Mem*Inst and the
AnyMem*Inst, but that's a much larger and more risky change. Not sure
I'm going to actually do that.
2025-05-06 14:24:40 -07:00
Philip Reames
d1b3eeb244
[SDAG] Merge memcpy and memcpy.inline lowering paths (#138619)
This is a follow up to c0a264e, but note that there is a functional
difference here: the root changes for the memcpy.inline case. This
difference appears to have been accidental, but I kept this back to
facility separate review in case there's something I'm missing here.
2025-05-06 07:37:44 -07:00
Philip Reames
c0a264e6a9
[IntrinsicInst] Remove MemCpyInlineInst and MemSetInlineInst [nfc] (#138568)
I'm looking for ways to simplify the Mem*Inst class structure, and these
two seem to have fairly minimal justification, so let's remove them.
2025-05-05 14:07:31 -07:00