599 Commits

Author SHA1 Message Date
Matt Arsenault
2502e3b7ba
IR: Promote "denormal-fp-math" to a first class attribute (#174293)
Convert "denormal-fp-math" and "denormal-fp-math-f32" into a first
class denormal_fpenv attribute. Previously the query for the effective
denormal mode involved two string attribute queries with parsing. I'm
introducing more uses of this, so it makes sense to convert this
to a more efficient encoding. The old representation was also awkward
since it was split across two separate attributes. The new encoding
just stores the default and float modes as bitfields, largely avoiding
the need to consider if the other mode is set.

The syntax in the common cases looks like this:
  `denormal_fpenv(preservesign,preservesign)`
  `denormal_fpenv(float: preservesign,preservesign)`
  `denormal_fpenv(dynamic,dynamic float: preservesign,preservesign)`

I wasn't sure about reusing the float type name instead of adding a
new keyword. It's parsed as a type but only accepts float. I'm also
debating switching the name to subnormal to match the current
preferred IEEE terminology (also used by nofpclass and other
contexts).

This has a behavior change when using the command flag debug
options to set the denormal mode. The behavior of the flag
ignored functions with an explicit attribute set, per
the default and f32 version. Now that these are one attribute,
the flag logic can't distinguish which of the two components
were explicitly set on the function. Only one test appeared to
rely on this behavior, so I just avoided using the flags in it.

This also does not perform all the code cleanups this enables.
In particular the attributor handling could be cleaned up.

I also guessed at how to support this in MLIR. I followed
MemoryEffects as a reference; it appears bitfields are expanded
into arguments to attributes, so the representation there is
a bit uglier with the 2 2-element fields flattened into 4 arguments.
2026-02-05 13:31:26 +00:00
Kshitij Paranjape
32cf905428
[AutoUpgrade] Handle invalid x86 intrinsics (#179374)
Fixes #176674 

Continuation of PR #177606.
2026-02-05 11:17:52 +01:00
Stefan Weigl-Bosker
7a2d46c85b
Revert "[AutoUpgrade] Prevent deletion of call if uses still exist (#177606)" (#179340)
This reverts commit 3007e2f050bd36e5e8dab68a5c9abbfbf4561314 (#177606)

Buildbot:

```
Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
[9/137] Linking CXX shared module unittests/Passes/Plugins/TestPlugin.so
[10/137] Linking CXX executable bin/llvm-config
[11/137] Building CXX object lib/IR/CMakeFiles/LLVMCore.dir/AutoUpgrade.cpp.o
[12/137] Linking CXX static library lib/libLLVMCore.a
[13/137] Generating VCSVersion.inc
[14/135] Linking CXX executable bin/apinotes-test
[15/135] Linking CXX executable bin/llvm-cxxmap
[16/135] Linking CXX executable bin/llvm-bcanalyzer
[17/135] Linking CXX executable bin/llvm-ctxprof-util
[18/135] Linking CXX executable bin/llvm-objcopy
FAILED: bin/llvm-objcopy 
: && /usr/bin/clang++ -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wno-pass-failed -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -fuse-ld=lld -Wl,--color-diagnostics    -Wl,--gc-sections  -Xlinker --dependency-file=tools/llvm-objcopy/CMakeFiles/llvm-objcopy.dir/link.d tools/llvm-objcopy/CMakeFiles/llvm-objcopy.dir/ObjcopyOptions.cpp.o tools/llvm-objcopy/CMakeFiles/llvm-objcopy.dir/llvm-objcopy.cpp.o tools/llvm-objcopy/CMakeFiles/llvm-objcopy.dir/llvm-objcopy-driver.cpp.o -o bin/llvm-objcopy  -Wl,-rpath,"\$ORIGIN/../lib:"  lib/libLLVMObject.a  lib/libLLVMObjCopy.a  lib/libLLVMOption.a  lib/libLLVMSupport.a  lib/libLLVMTargetParser.a  lib/libLLVMMC.a  lib/libLLVMBinaryFormat.a  lib/libLLVMIRReader.a  lib/libLLVMBitReader.a  lib/libLLVMAsmParser.a  lib/libLLVMCore.a  lib/libLLVMRemarks.a  lib/libLLVMBitstreamReader.a  lib/libLLVMMCParser.a  lib/libLLVMTextAPI.a  lib/libLLVMDebugInfoDWARFLowLevel.a  -lrt  -ldl  -lm  /usr/lib/aarch64-linux-gnu/libz.so  /usr/lib/aarch64-linux-gnu/libzstd.so  lib/libLLVMDemangle.a && :
ld.lld: error: undefined symbol: llvm::Value::dump() const
>>> referenced by AutoUpgrade.cpp
>>>               AutoUpgrade.cpp.o:(reportFatalUsageErrorWithCI(llvm::StringRef, llvm::CallBase*)) in archive lib/libLLVMCore.a
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
```
2026-02-02 17:12:04 -05:00
Kshitij Paranjape
3007e2f050
[AutoUpgrade] Prevent deletion of call if uses still exist (#177606)
The calls to the llvm.x86.sse2.pshuflw are being deleted due to invalid
vector type, even though uses still exist. Adding checks to prevent
deletion of call when uses still exist or even if eraseFromParent() is
called ensuring it is called after replaceAllUsesWith().

Fixes: #176674
2026-02-02 16:11:13 -05:00
Craig Topper
05e2ee9664
[RISCV] Replace riscv.clmul intrinsic with llvm.clmul (#178092)
I did not replace riscv.clmulh/clmulr since those require a multiple
instruction pattern match. I wanted to ensure that -O0 will select the
correct instructions without relying on combines.
2026-01-26 21:12:48 -08:00
Matt Arsenault
0d4a35d560
IR: Remove llvm.convert.to.fp16 and llvm.convert.from.fp16 intrinsics (#174484)
These are long overdue for removal. These were originally a hack
to support loading half values before there was any / decent support
for the half type through the backend. There's no reason to continue
supporting these, they're equivalent to fpext/fptrunc with a bitcast.

SelectionDAG stopped translating these directly, and used the
bitcast + fp cast since f7a02c17628e825, so there's been no reason
to use these since 2014.
2026-01-21 09:50:28 +00:00
Jonas Paulsson
8eccda10d2
[SystemZ] Add SP alignment to the DataLayout string. (#176041)
Add '-S64' to the SystemZ datalayout string, to avoid overalignment of
stack objects.

Fixes #173402
2026-01-20 09:54:47 -06:00
Srinivasa Ravi
13205c51fc
[clang][NVPTX] Add missing half-precision add/mul/fma intrinsics (#170079)
This change adds the following missing half-precision
add/sub/fma intrinsics for the NVPTX target:
- `llvm.nvvm.add.rn{.ftz}.sat.f16`
- `llvm.nvvm.add.rn{.ftz}.sat.v2f16`
- `llvm.nvvm.mul.rn{.ftz}.sat.f16`
- `llvm.nvvm.mul.rn{.ftz}.sat.v2f16`
- `llvm.nvvm.fma.rn.oob.*`

We lower `fneg` followed by one of the above addition 
intrinsics to the corresponding `sub` instruction.

This also removes some incorrect `bf16` fma intrinsics with no
valid lowering.

PTX spec reference:
https://docs.nvidia.com/cuda/parallel-thread-execution/#half-precision-floating-point-instructions
2026-01-20 17:56:55 +05:30
Mikołaj Piróg
d03ce72f40
[IR] Propagate fast-math flags through autoupgraded target intrinsics (#174432)
Fast-math flags were not copied through upgrades; they are now.
2026-01-15 21:15:14 +01:00
Alex MacLean
bc8fcba3bb
[NVPTX][AutoUpgrade] Use integer min/max intrinsics instead of icmp, select (#173097) 2026-01-07 12:28:48 -08:00
Shilei Tian
5a63367b15
Reapply "[AMDGPU] Rework the clamp support for WMMA instructions" (#174674) (#174697)
This reverts commit 0b2f3cfb72a76fa90f3ec2a234caabe0d0712590.
2026-01-07 06:12:19 +00:00
dyung
0b2f3cfb72
Revert "[AMDGPU] Rework the clamp support for WMMA instructions" (#174674)
Reverts llvm/llvm-project#174310

This change is causing 2 cross-project-test failures on
https://lab.llvm.org/buildbot/#/builders/174/builds/29695
2026-01-07 01:18:23 +00:00
Shilei Tian
ccca3b8c67
[AMDGPU] Rework the clamp support for WMMA instructions (#174310)
Fixes #166989.
2026-01-06 15:46:40 -05:00
Luke Lau
ad4bfac732
[IR] Split vector.splice into vector.splice.left and vector.splice.right (#170796)
This PR implements the first change outlined in
https://discourse.llvm.org/t/rfc-allow-non-constant-offsets-in-llvm-vector-splice/88974?u=lukel

In order to allow non-immediate offsets in the llvm.vector.splice
intrinsic, we need to separate out the "shift left" and "shift right"
modes into two separate intrinsics, which were previously determined by
whether or not the offset is positive or negative.

The description in the LangRef has also been reworded in terms of
sliding elements left or right and extracting either the upper or lower
half as opposed to extracting from a certain index, which brings it
inline with the definition of `llvm.fshr.*`/`llvm.fshl.*`.

This patch teaches AutoUpgrade.cpp to upgrade the old intrinsics into
their new equivalent one based on their offset, so existing uses of
vector.splice should still work.

Uses of llvm.vector.splice in `llvm/test/CodeGen` haven't been replaced
in this PR to keep the diff small and kick the tyres on the AutoUpgrader
a bit. I planned to do this in a follow up NFC but can include it in
this PR if reviewers prefer.

Similarly the shuffle costing kind `SK_Splice` has just been kept the
same for now, to be split into `SK_SpliceLeft` and `SK_SpliceRight`
later.
2026-01-06 15:41:26 +08:00
Shilei Tian
c97de4387b
Revert "[AMDGPU] add clamp immediate operand to WMMA iu8 intrinsic (#171069)" (#174303)
This reverts commit 2c376ffeca490a5732e4fd6e98e5351fcf6d692a because it
breaks assembler.

```
$ llvm-mc -triple=amdgcn -mcpu=gfx1250 -show-encoding <<< "v_wmma_i32_16x16x64_iu8 v[16:23], v[0:7], v[8:15], v[16:23] matrix_b_reuse"
  v_wmma_i32_16x16x64_iu8 v[16:23], v[0:7], v[8:15], v[16:23] clamp ; encoding: [0x10,0x80,0x72,0xcc,0x00,0x11,0x42,0x1c]
```

We have a fundamental issue in the clamp support in VOP3P instructions,
which will need more changes.
2026-01-04 02:13:21 +00:00
Muhammad Abdul
2c376ffeca
[AMDGPU] add clamp immediate operand to WMMA iu8 intrinsic (#171069)
Fixes #166989 

- Adds a clamp immediate operand to the AMDGPU WMMA iu8 intrinsic and
threads it through LLVM IR, MIR lowering, Clang builtins/tests, and MLIR
ROCDL dialect so all layers agree on the new operand
- Updates AMDGPUWmmaIntrinsicModsAB so the clamp attribute is emitted,
teaches VOP3P encoding to accept the immediate, and adjusts Clang
codegen/builtin headers plus MLIR op definitions and tests to match
- Documents what the WMMA clamp operand do
- Implement bitcode AutoUpgrade for source compatibility on WMMA IU8
Intrinsic op

Possible future enhancements:
- infer clamping as an optimization fold based on the use context

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-12-27 12:51:29 -05:00
Kevin Per
fc1fd1065b
[AutoUpgrade]: Fixed assertion by considering number of args (#172911)
The assertion was violated because the intrinsic had too many arguments.
In that case, fall back to the default handling.

Closes https://github.com/llvm/llvm-project/issues/172817
2025-12-19 10:02:20 +00:00
Alex MacLean
a40f444265
[NVPTX] Add support for barrier.cta.red.* instructions (#172541)
This change adds full support for the ptx `barrier.cta.red` instruction,
following the same conventions as are already used for
`barrier.cta.sync` and `barrier.cta.arrive`.

In addition this MR removes the following intrinsics which are no longer
needed:
* llvm.nvvm.barrier0.popc -->
  llvm.nvvm.barrier.cta.red.popc.aligned.all(0, c)
* llvm.nvvm.barrier0.and -->
  llvm.nvvm.barrier.cta.red.and.aligned.all(0, z)
* llvm.nvvm.barrier0.or -->
  llvm.nvvm.barrier.cta.red.or.aligned.all(0, z)
2025-12-18 18:06:27 -08:00
Nikita Popov
b7c0452a9a
[PowerPC][AIX] Specify correct ABI alignment for double (#144673)
Add `f64:32:64` to the data layout for AIX, to indicate that doubles
have a 32-bit ABI alignment and 64-bit preferred alignment.

Clang was already taking this into account, but it was not reflected in
LLVM's data layout.

A notable effect of this change is that `double` loads/stores with 4
byte alignment are no longer considered "unaligned" and avoid the
corresponding unaligned access legalization. I assume that this is
correct/desired for AIX. (The codegen previously already relied on this
in some places related to the call ABI simply by dint of assuming
certain stack locations were 8 byte aligned, even though they were only
actually 4 byte aligned.)

Fixes https://github.com/llvm/llvm-project/issues/133599.
2025-12-11 08:57:26 +01:00
anjenner
27651133e2
AMDGPU: Drop and upgrade llvm.amdgcn.atomic.csub/cond.sub to atomicrmw (#105553)
These both perform conditional subtraction, returning the minuend and
zero respectively, if the difference is negative.
2025-12-09 23:13:33 +00:00
BaiXilin
4f79552d25
[x86][AVX-VNNI] Fix VPDPWXXD Argument Types (#169456)
Fixed the argument types of the following intrinsics to match with the
ISA:
 - vpdpwssd_128, vpdpwssd_256, vpdpwssd_512,
 - vpdpwssds_128, vpdpwssds_256, vpdpwssds_512
 - vpdpwsud_128, vpdpwsud_256, vpdowsud_512
 - vpdpwsuds_128, vpdpwsuds_256, vpdpwsuds_512
 - vpdpwusd_128, vpdpwusd_256, vpdpwusd_512
 - vpdpwusds_128, vpdpwusds_256, vpdpwusds_512
 - vpdpwuud_128, vpdpwuud_256, vpdpwuud_512
 - vpdpwuuds_128, vpdpwuuds_256, vpdpwuuds_512

Fixes #97271. Note that this is the last PR for the issue.
2025-12-09 17:10:20 +00:00
Paul Walker
b5a3b8b704
[LLVM][SVE] Remove aarch64.sve.rev intrinsic, using vector.reverse instead. (#169654) 2025-11-28 11:59:34 +00:00
Jakub Kuderski
4c21d0cb14
[ADT] Prepare to deprecate variadic StringSwitch::Cases. NFC. (#166020)
Update all uses of variadic `.Cases` to use the initializer list
overload instead. I plan to mark variadic `.Cases` as deprecated in a
followup PR.

For more context, see https://github.com/llvm/llvm-project/pull/163117.
2025-11-02 00:12:33 +00:00
Alex MacLean
4a383f9ff7
[NVPTX] Add ex2.approx bf16 support and cleanup intrinsic definition (#165446) 2025-11-01 17:51:17 +00:00
Nikita Popov
12bf1836de [AutoUpgrade] Gracefully handle invalid alignment on masked intrinsics
Generate a usage error instead of asserting.
2025-10-22 12:47:26 +02:00
Daniel Kiss
048070ba6f
[ARM][AArch64] BTI,GCS,PAC Module flag update. (#86212)
Module flag is used to indicate the feature to be propagated to the
function. As now the frontend emits all attributes accordingly let's
help the auto upgrade to only do work when old and new bitcodes are
merged.

Depends on #82819 and #86031
2025-10-22 09:29:06 +02:00
Nikita Popov
573ca36753
[IR] Replace alignment argument with attribute on masked intrinsics (#163802)
The `masked.load`, `masked.store`, `masked.gather` and `masked.scatter`
intrinsics currently accept a separate alignment immarg. Replace this
with an `align` attribute on the pointer / vector of pointers argument.

This is the standard representation for alignment information on
intrinsics, and is already used by all other memory intrinsics. This
means the signatures now match llvm.expandload, llvm.vp.load, etc.
(Things like llvm.memcpy used to have a separate alignment argument as
well, but were already migrated a long time ago.)

It's worth noting that the masked.gather and masked.scatter intrinsics
previously accepted a zero alignment to indicate the ABI type alignment
of the element type. This special case is gone now: If the align
attribute is omitted, the implied alignment is 1, as usual. If ABI
alignment is desired, it needs to be explicitly emitted (which the
IRBuilder API already requires anyway).
2025-10-20 08:50:09 +00:00
Joseph Huber
728e925476
[AMDPGU] Auto-upgrade ELF mangling in the data layout (#163644)
Summary:
The changes in https://github.com/llvm/llvm-project/pull/163011 caused
all ELF platforms to default to ELF mangling. We want to auto upgrade
this for linking in new programs to old ones.
2025-10-17 09:00:42 -05:00
BaiXilin
0d9dd60815
[x86][AVX-VNNI] Fix VPDPBXXD Argument Type (#159222)
Fixed intrinsic VPDP[SS,SU,UU]D[,S]_128/256/512's argument types to match with the ISA.
Fixes part of #97271.
2025-09-30 09:41:12 +00:00
Sander de Smalen
17e008db17
[IR] NFC: Remove 'experimental' from partial.reduce.add intrinsic (#158637)
The partial reduction intrinsics are no longer experimental, because
they've been used in production for a while and are unlikely to change.
2025-09-17 11:44:47 +01:00
BaiXilin
94e2c19f86
[x86][AVX-VNNI] Fix VPDPBUSD Argument Types (#155194)
Fixed intrinsic VPDPBUSD[,S]_128/256/512's argument types to match with the ISA.

Fixes part of #97271
2025-09-10 12:24:16 +00:00
Alexandre Ganea
5cda2424c8
[LLD][COFF] Add more --time-trace tags for ThinLTO linking (#156471)
In order to better see what's going on during ThinLTO linking, this PR
adds more profile tags when using `--time-trace` on a `lld-link.exe`
invocation.

After PR, linking `clang.exe`:

<img width="3839" height="2026" alt="Capture d’écran 2025-09-02 082021"
src="https://github.com/user-attachments/assets/bf0c85ba-2f85-4bbf-a5c1-800039b56910"
/>

Linking a custom (Unreal Engine game) binary gives a completly
different picture, probably because of using Unity files, and the sheer
amount of input files (here, providing over 60 GB of .OBJs/.LIBs).

<img width="1940" height="1008" alt="Capture d’écran 2025-09-02 102048"
src="https://github.com/user-attachments/assets/60b28630-7995-45ce-9e8c-13f3cb5312e0"
/>
2025-09-05 15:28:19 -04:00
Alex MacLean
06bcc34e3d
[NVPTX] Auto-upgrade nvvm.grid_constant to param attribute (#155489)
Upgrade the !"grid_constant" !nvvm.annotation to a "nvvm.grid_constant"
attribute. This attribute is much simpler for front-ends to apply and
faster and simpler to query.
2025-08-27 16:32:28 -07:00
Kazu Hirata
07eb7b7692
[llvm] Replace SmallSet with SmallPtrSet (NFC) (#154068)
This patch replaces SmallSet<T *, N> with SmallPtrSet<T *, N>.  Note
that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer
element types:

  template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};

We only have 140 instances that rely on this "redirection", with the
vast majority of them under llvm/. Since relying on the redirection
doesn't improve readability, this patch replaces SmallSet with
SmallPtrSet for pointer element types.
2025-08-18 07:01:29 -07:00
Nikita Popov
02f3e95a42 [AutoUpgrade] Fix use after free
Determine the intrinsic ID before the name is freed during renaming.
2025-08-08 11:54:09 +02:00
Nikita Popov
c23b4fbdbb
[IR] Remove size argument from lifetime intrinsics (#150248)
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.

This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
2025-08-08 11:09:34 +02:00
Meredith Julian
be58069515
[LLVM][NVPTX] Upstream tanh intrinsic for libdevice (#149596)
Currently __nv_fast_tanhf() in libdevice maps to an nvvm intrinsic that
has not been upstreamed, which is causing issues when using the NVPTX
backend from upstream. Instead of upstreaming the intrinsic, we can
instead use the existing Intrinsic::tanh with the afn flag. This change
adds NVPTX backend support for ISD::TANH, adds auto-upgrade for the old
tanh_approx intrinsic to @llvm.tanh.f32 with afn flag so that libdevice
works properly upstream, and adds a basic codegen test and a case to the
auto-upgrade test.
2025-07-24 14:32:59 -07:00
Nikita Popov
92c55a315e
[IR] Only allow lifetime.start/end on allocas (#149310)
lifetime.start and lifetime.end are primarily intended for use on
allocas, to enable stack coloring and other liveness optimizations. This
is necessary because all (static) allocas are hoisted into the entry
block, so lifetime markers are the only way to convey the actual
lifetimes.

However, lifetime.start and lifetime.end are currently *allowed* to be
used on non-alloca pointers. We don't actually do this in practice, but
just the mere fact that this is possible breaks the core purpose of the
lifetime markers, which is stack coloring of allocas. Stack coloring can
only work correctly if all lifetime markers for an alloca are
analyzable.

* If a lifetime marker may operate on multiple allocas via a select/phi,
we don't know which lifetime actually starts/ends and handle it
incorrectly (https://github.com/llvm/llvm-project/issues/104776).
* Stack coloring operates on the assumption that all lifetime markers
are visible, and not, for example, hidden behind a function call or
escaped pointer. It's not possible to change this, as part of the
purpose of lifetime markers is that they work even in the presence of
escaped pointers, where simple use analysis is insufficient.

I don't think there is any way to have coherent semantics for lifetime
markers on allocas, while also permitting them on arbitrary pointer
values.

This PR restricts lifetimes to operate on allocas only. As a followup, I
will also drop the size argument, which is superfluous if we always
operate on an alloca. (This change also renders various code handling
lifetime markers on non-alloca dead. I plan to clean up that kind of
code after dropping the size argument as well.)

In practice, I've only found a few places that currently produce
lifetimes on non-allocas:

* CoroEarly replaces the promise alloca with the result of an intrinsic,
which will later be replaced back with an alloca. I think this is the
only place where there is some legitimate loss of functionality, but I
don't think this is particularly important (I don't think we'd expect
the promise in a coroutine to admit useful lifetime optimization.)
* SafeStack moves unsafe allocas onto a separate frame. We can safely
drop lifetimes here, as SafeStack performs its own stack coloring.
* Similar for AddressSanitizer, it also moves allocas into separate
memory.
* LSR sometimes replaces the lifetime argument with a GEP chain of the
alloca (where the offsets ultimately cancel out). This is just
unnecessary. (Fixed separately in
https://github.com/llvm/llvm-project/pull/149492.)
* InferAddrSpaces sometimes makes lifetimes operate on an addrspacecast
of an alloca. I don't think this is necessary.
2025-07-21 15:04:50 +02:00
David Green
9fcea2e465 [ARM] Add neon vector support for roundeven
As per #142559, this marks froundeven as legal for Neon and upgrades the
existing arm.neon.vrintn intrinsics.
2025-07-04 15:27:33 +01:00
David Green
ec35065789 [ARM] Add neon vector support for rint
As per #142559, this marks frint as legal for Neon and upgrades the existing
arm.neon.vrintx intrinsics.
2025-07-03 21:27:48 +01:00
David Green
1f8f477bd0 [ARM] Add neon vector support for trunc
As per #142559, this marks ftrunc as legal for Neon and upgrades the existing
arm.neon.vrintz intrinsics.
2025-07-03 07:41:13 +01:00
David Green
5332534b9c [ARM] Add neon vector support for ceil
As per #142559, this marks fceil as legal for Neon and upgrades the existing
arm.neon.vrintp intrinsics.
2025-07-01 15:41:10 +01:00
David Green
6bd9ff04af [ARM] Add neon vector support for round
As per #142559, this marks fround as legal for Neon and upgrades the existing
arm.neon.vrinta intrinsics.
2025-06-30 17:15:26 +01:00
David Green
dcc9e36b18
[ARM] Add neon vector support for floor (#142559)
This marks ffloor as legal providing that armv8 and neon is present (or
fullfp16 for the fp16 instructions). The existing arm_neon_vrintm
intrinsics are auto-upgraded to llvm.floor.

If this is OK I will update the other vrint intrinsics.
2025-06-29 11:37:16 +01:00
Nikita Popov
9a6a87da6e [AutoUpgrade] Remove unnecessary name check (NFCI)
If only the name is incorrect (due to added overload), but the
signature is correct, we should go through the generic remangling
upgrade.
2025-06-23 14:56:24 +02:00
Durgadoss R
3e5d50f9c6
[NVPTX] Add cta_group support to TMA G2S intrinsics (#143178)
This patch extends the TMA G2S intrinsics with the
support for cta_group::1/2 available from Blackwell onwards.
The existing intrinsics are auto-upgraded with a default
value of '0' for the `cta_group` flag operand.

* lit tests are added for all combinations of the newer variants.
* Negative tests are added to validate the error-handling 
   when the value of the cta_group flag falls out-of-range.
* The generated PTX is verified with a 12.8 ptxas executable.

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
2025-06-12 15:20:39 +05:30
Jeremy Morse
459475020a Reapply 76197ea6f91f after removing an assertion
Specifically this is the assertion in BasicBlock.cpp. Now that we're not
examining or setting that flag consistently (because it'll be deleted in
about an hour) there's no need to keep this assertion.

Original commit title:

[DebugInfo][RemoveDIs] Remove some debug intrinsic-only codepaths (#143451)
2025-06-11 17:35:29 +01:00
Jeremy Morse
76197ea6f9 Revert "[DebugInfo][RemoveDIs] Remove some debug intrinsic-only codepaths (#143451)"
This reverts commit c71a2e688828ab3ede4fb54168a674ff68396f61.

/me squints -- this is hitting an assertion I thought had been deleted,
will revert and investigate for a bit.
2025-06-11 14:52:17 +01:00
Jeremy Morse
c71a2e6888
[DebugInfo][RemoveDIs] Remove some debug intrinsic-only codepaths (#143451)
These are opportunistic deletions as more places that make use of the
IsNewDbgInfoFormat flag are removed. It should (TM)(R) all be dead code
now that `IsNewDbgInfoFormat` should be true everywhere.

FastISel: we don't need to do debug-aware instruction counting any more,
because there are no debug instructions,
Autoupgrade: you can no-longer avoid autoupgrading of intrinsics to
records
DIBuilder: Delete the code for creating debug intrinsics (!)
LoopUtils: No need to handle debug instructions, they don't exist
2025-06-11 14:43:15 +01:00
Jeremy Morse
3d7aa961ac
[DebugInfo][RemoveDIs] Use autoupgrader to convert old debug-info (#143452)
By chance, two things have prevented the autoupgrade path being
exercised much so far:
 * LLParser setting the debug-info mode to "old" on seeing intrinsics,
* The test in AutoUpgrade.cpp wanting to upgrade into a "new" debug-info
block.

In practice, this appears to mean this code path hasn't seen the various
invalid inputs that can come its way. This commit does a number of
things:
* Tolerates the various illegal inputs that can be written with
debug-intrinsics, and that must be tolerated until the Verifier runs,
 * Printing illegal/null DbgRecord fields must succeed,
* Verifier errors need to localise the function/block where the error
is,
 * Tests that now see debug records will print debug-record errors,

Plus a few new tests for other intrinsic-to-debug-record failures modes
I found. There are also two edge cases:
* Some of the unit tests switch back and forth between intrinsic and
record modes at will; I've deleted coverage and some assertions to
tolerate this as intrinsic support is now Gone (TM),
* In sroa-extract-bits.ll, the order of debug records flips. This is
because the autoupgrader upgrades in the opposite order to the basic
block conversion routines... which doesn't change the record order, but
_does_ change the use list order in Metadata! This should (TM) have no
consequence to the correctness of LLVM, but will change the order of
various records and the order of DWARF record output too.

I tried to reduce this patch to a smaller collection of changes, but
they're all intertwined, sorry.
2025-06-11 13:56:30 +01:00