607 Commits

Author SHA1 Message Date
Stanislav Mekhanoshin
5f99854d01
[AMDGPU] Drop A and B neg modifier from amdgcn_wmma_bf16_16x16x32_bf16 (#189468)
Fixes: LCOMPILER-1673
2026-03-30 14:14:22 -07:00
Stanislav Mekhanoshin
a2d84b5d8d
[AMDGPU] Remove neg support from 4 more gfx1250 WMMA (#189115)
These are previously covered by AMDGPUWmmaIntrinsicModsAllReuse.
2026-03-27 15:20:14 -07:00
Stanislav Mekhanoshin
e69c7312f3
[AMDGPU] Disable neg_lo[0:1] and neg_hi[0:1] on wmma_f32_16x16x32_bf16 (#188649)
This is the pilot change, the rest will follow the same idea.
2026-03-26 00:37:05 -07:00
Alex MacLean
a9775221ae
[NVPTX] Canonicalize NVVM attribute strings and refactor property queries (NFC) (#187752) 2026-03-23 08:07:09 -07:00
Akshay Deodhar
8b265cf270
[NVPTX][AutoUpgrade] atom.load intrinsics should be autoupgraded to monotonic atomicrmw for NVPTX (#187140)
Prior to https://github.com/llvm/llvm-project/pull/179553, the seq_cst
qualifier was being ignored. The expected codegen for these intrinsics
is `atom.relaxed`- which corresponds to `Monotonic`. The fix does to
AutoUpgrade what https://github.com/llvm/llvm-project/pull/185822 does
to clang.
2026-03-18 01:26:47 +00:00
Shilei Tian
f05d2e8a39
[AMDGPU] Make uniform-work-group-size a valueless attribute (#183925)
The "uniform-work-group-size" function attribute previously took a
string value of "true" or "false". Since presence alone can convey the
"true" semantics and absence can convey "false", the value is
unnecessary.

This patch converts it to a valueless string attribute: presence
indicates true, absence indicates false. For backward compatibility,
auto-upgrade logic is added in both UpgradeAttributes (bitcode) and
UpgradeFunctionAttributes: if the old value is "true", the attribute is
kept without a value; if "false", the attribute is removed.
2026-03-01 21:29:55 +00:00
Shilei Tian
70905e0afa
[RFC][IR] Remove Constant::isZeroValue (#181521)
`Constant::isZeroValue` currently behaves same as
`Constant::isNullValue` for all types except floating-point, where it
additionally returns true for negative zero (`-0.0`). However, in
practice, almost all callers operate on integer/pointer types where the
two are equivalent, and the few FP-relevant callers have no meaningful
dependence on the `-0.0` behavior.

This PR removes `isZeroValue` to eliminate the confusing API. All
callers are changed to `isNullValue` with no test failures.

`isZeroValue` will be reintroduced in a future change with clearer
semantics: when null pointers may have non-zero bit patterns,
`isZeroValue` will check for bitwise-all-zeros, while `isNullValue` will
check for the semantic null (which
may be non-zero).
2026-02-15 12:06:42 -05:00
Sam Elliott
0d08cb0e70
[outliners] Turn nooutline into an Enum Attribute (#163665)
This change turns the `"nooutline"` attribute into an enum attribute
called `nooutline`, and adds an auto-upgrader for bitcode to make the
same change to existing IR.

This IR attribute disables both the Machine Outliner (enabled at Oz for
some targets), and the IR Outliner (disabled by default).
2026-02-10 21:44:17 -08:00
Matt Arsenault
2502e3b7ba
IR: Promote "denormal-fp-math" to a first class attribute (#174293)
Convert "denormal-fp-math" and "denormal-fp-math-f32" into a first
class denormal_fpenv attribute. Previously the query for the effective
denormal mode involved two string attribute queries with parsing. I'm
introducing more uses of this, so it makes sense to convert this
to a more efficient encoding. The old representation was also awkward
since it was split across two separate attributes. The new encoding
just stores the default and float modes as bitfields, largely avoiding
the need to consider if the other mode is set.

The syntax in the common cases looks like this:
  `denormal_fpenv(preservesign,preservesign)`
  `denormal_fpenv(float: preservesign,preservesign)`
  `denormal_fpenv(dynamic,dynamic float: preservesign,preservesign)`

I wasn't sure about reusing the float type name instead of adding a
new keyword. It's parsed as a type but only accepts float. I'm also
debating switching the name to subnormal to match the current
preferred IEEE terminology (also used by nofpclass and other
contexts).

This has a behavior change when using the command flag debug
options to set the denormal mode. The behavior of the flag
ignored functions with an explicit attribute set, per
the default and f32 version. Now that these are one attribute,
the flag logic can't distinguish which of the two components
were explicitly set on the function. Only one test appeared to
rely on this behavior, so I just avoided using the flags in it.

This also does not perform all the code cleanups this enables.
In particular the attributor handling could be cleaned up.

I also guessed at how to support this in MLIR. I followed
MemoryEffects as a reference; it appears bitfields are expanded
into arguments to attributes, so the representation there is
a bit uglier with the 2 2-element fields flattened into 4 arguments.
2026-02-05 13:31:26 +00:00
Kshitij Paranjape
32cf905428
[AutoUpgrade] Handle invalid x86 intrinsics (#179374)
Fixes #176674 

Continuation of PR #177606.
2026-02-05 11:17:52 +01:00
Stefan Weigl-Bosker
7a2d46c85b
Revert "[AutoUpgrade] Prevent deletion of call if uses still exist (#177606)" (#179340)
This reverts commit 3007e2f050bd36e5e8dab68a5c9abbfbf4561314 (#177606)

Buildbot:

```
Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
[9/137] Linking CXX shared module unittests/Passes/Plugins/TestPlugin.so
[10/137] Linking CXX executable bin/llvm-config
[11/137] Building CXX object lib/IR/CMakeFiles/LLVMCore.dir/AutoUpgrade.cpp.o
[12/137] Linking CXX static library lib/libLLVMCore.a
[13/137] Generating VCSVersion.inc
[14/135] Linking CXX executable bin/apinotes-test
[15/135] Linking CXX executable bin/llvm-cxxmap
[16/135] Linking CXX executable bin/llvm-bcanalyzer
[17/135] Linking CXX executable bin/llvm-ctxprof-util
[18/135] Linking CXX executable bin/llvm-objcopy
FAILED: bin/llvm-objcopy 
: && /usr/bin/clang++ -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wno-pass-failed -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -fuse-ld=lld -Wl,--color-diagnostics    -Wl,--gc-sections  -Xlinker --dependency-file=tools/llvm-objcopy/CMakeFiles/llvm-objcopy.dir/link.d tools/llvm-objcopy/CMakeFiles/llvm-objcopy.dir/ObjcopyOptions.cpp.o tools/llvm-objcopy/CMakeFiles/llvm-objcopy.dir/llvm-objcopy.cpp.o tools/llvm-objcopy/CMakeFiles/llvm-objcopy.dir/llvm-objcopy-driver.cpp.o -o bin/llvm-objcopy  -Wl,-rpath,"\$ORIGIN/../lib:"  lib/libLLVMObject.a  lib/libLLVMObjCopy.a  lib/libLLVMOption.a  lib/libLLVMSupport.a  lib/libLLVMTargetParser.a  lib/libLLVMMC.a  lib/libLLVMBinaryFormat.a  lib/libLLVMIRReader.a  lib/libLLVMBitReader.a  lib/libLLVMAsmParser.a  lib/libLLVMCore.a  lib/libLLVMRemarks.a  lib/libLLVMBitstreamReader.a  lib/libLLVMMCParser.a  lib/libLLVMTextAPI.a  lib/libLLVMDebugInfoDWARFLowLevel.a  -lrt  -ldl  -lm  /usr/lib/aarch64-linux-gnu/libz.so  /usr/lib/aarch64-linux-gnu/libzstd.so  lib/libLLVMDemangle.a && :
ld.lld: error: undefined symbol: llvm::Value::dump() const
>>> referenced by AutoUpgrade.cpp
>>>               AutoUpgrade.cpp.o:(reportFatalUsageErrorWithCI(llvm::StringRef, llvm::CallBase*)) in archive lib/libLLVMCore.a
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
```
2026-02-02 17:12:04 -05:00
Kshitij Paranjape
3007e2f050
[AutoUpgrade] Prevent deletion of call if uses still exist (#177606)
The calls to the llvm.x86.sse2.pshuflw are being deleted due to invalid
vector type, even though uses still exist. Adding checks to prevent
deletion of call when uses still exist or even if eraseFromParent() is
called ensuring it is called after replaceAllUsesWith().

Fixes: #176674
2026-02-02 16:11:13 -05:00
Craig Topper
05e2ee9664
[RISCV] Replace riscv.clmul intrinsic with llvm.clmul (#178092)
I did not replace riscv.clmulh/clmulr since those require a multiple
instruction pattern match. I wanted to ensure that -O0 will select the
correct instructions without relying on combines.
2026-01-26 21:12:48 -08:00
Matt Arsenault
0d4a35d560
IR: Remove llvm.convert.to.fp16 and llvm.convert.from.fp16 intrinsics (#174484)
These are long overdue for removal. These were originally a hack
to support loading half values before there was any / decent support
for the half type through the backend. There's no reason to continue
supporting these, they're equivalent to fpext/fptrunc with a bitcast.

SelectionDAG stopped translating these directly, and used the
bitcast + fp cast since f7a02c17628e825, so there's been no reason
to use these since 2014.
2026-01-21 09:50:28 +00:00
Jonas Paulsson
8eccda10d2
[SystemZ] Add SP alignment to the DataLayout string. (#176041)
Add '-S64' to the SystemZ datalayout string, to avoid overalignment of
stack objects.

Fixes #173402
2026-01-20 09:54:47 -06:00
Srinivasa Ravi
13205c51fc
[clang][NVPTX] Add missing half-precision add/mul/fma intrinsics (#170079)
This change adds the following missing half-precision
add/sub/fma intrinsics for the NVPTX target:
- `llvm.nvvm.add.rn{.ftz}.sat.f16`
- `llvm.nvvm.add.rn{.ftz}.sat.v2f16`
- `llvm.nvvm.mul.rn{.ftz}.sat.f16`
- `llvm.nvvm.mul.rn{.ftz}.sat.v2f16`
- `llvm.nvvm.fma.rn.oob.*`

We lower `fneg` followed by one of the above addition 
intrinsics to the corresponding `sub` instruction.

This also removes some incorrect `bf16` fma intrinsics with no
valid lowering.

PTX spec reference:
https://docs.nvidia.com/cuda/parallel-thread-execution/#half-precision-floating-point-instructions
2026-01-20 17:56:55 +05:30
Mikołaj Piróg
d03ce72f40
[IR] Propagate fast-math flags through autoupgraded target intrinsics (#174432)
Fast-math flags were not copied through upgrades; they are now.
2026-01-15 21:15:14 +01:00
Alex MacLean
bc8fcba3bb
[NVPTX][AutoUpgrade] Use integer min/max intrinsics instead of icmp, select (#173097) 2026-01-07 12:28:48 -08:00
Shilei Tian
5a63367b15
Reapply "[AMDGPU] Rework the clamp support for WMMA instructions" (#174674) (#174697)
This reverts commit 0b2f3cfb72a76fa90f3ec2a234caabe0d0712590.
2026-01-07 06:12:19 +00:00
dyung
0b2f3cfb72
Revert "[AMDGPU] Rework the clamp support for WMMA instructions" (#174674)
Reverts llvm/llvm-project#174310

This change is causing 2 cross-project-test failures on
https://lab.llvm.org/buildbot/#/builders/174/builds/29695
2026-01-07 01:18:23 +00:00
Shilei Tian
ccca3b8c67
[AMDGPU] Rework the clamp support for WMMA instructions (#174310)
Fixes #166989.
2026-01-06 15:46:40 -05:00
Luke Lau
ad4bfac732
[IR] Split vector.splice into vector.splice.left and vector.splice.right (#170796)
This PR implements the first change outlined in
https://discourse.llvm.org/t/rfc-allow-non-constant-offsets-in-llvm-vector-splice/88974?u=lukel

In order to allow non-immediate offsets in the llvm.vector.splice
intrinsic, we need to separate out the "shift left" and "shift right"
modes into two separate intrinsics, which were previously determined by
whether or not the offset is positive or negative.

The description in the LangRef has also been reworded in terms of
sliding elements left or right and extracting either the upper or lower
half as opposed to extracting from a certain index, which brings it
inline with the definition of `llvm.fshr.*`/`llvm.fshl.*`.

This patch teaches AutoUpgrade.cpp to upgrade the old intrinsics into
their new equivalent one based on their offset, so existing uses of
vector.splice should still work.

Uses of llvm.vector.splice in `llvm/test/CodeGen` haven't been replaced
in this PR to keep the diff small and kick the tyres on the AutoUpgrader
a bit. I planned to do this in a follow up NFC but can include it in
this PR if reviewers prefer.

Similarly the shuffle costing kind `SK_Splice` has just been kept the
same for now, to be split into `SK_SpliceLeft` and `SK_SpliceRight`
later.
2026-01-06 15:41:26 +08:00
Shilei Tian
c97de4387b
Revert "[AMDGPU] add clamp immediate operand to WMMA iu8 intrinsic (#171069)" (#174303)
This reverts commit 2c376ffeca490a5732e4fd6e98e5351fcf6d692a because it
breaks assembler.

```
$ llvm-mc -triple=amdgcn -mcpu=gfx1250 -show-encoding <<< "v_wmma_i32_16x16x64_iu8 v[16:23], v[0:7], v[8:15], v[16:23] matrix_b_reuse"
  v_wmma_i32_16x16x64_iu8 v[16:23], v[0:7], v[8:15], v[16:23] clamp ; encoding: [0x10,0x80,0x72,0xcc,0x00,0x11,0x42,0x1c]
```

We have a fundamental issue in the clamp support in VOP3P instructions,
which will need more changes.
2026-01-04 02:13:21 +00:00
Muhammad Abdul
2c376ffeca
[AMDGPU] add clamp immediate operand to WMMA iu8 intrinsic (#171069)
Fixes #166989 

- Adds a clamp immediate operand to the AMDGPU WMMA iu8 intrinsic and
threads it through LLVM IR, MIR lowering, Clang builtins/tests, and MLIR
ROCDL dialect so all layers agree on the new operand
- Updates AMDGPUWmmaIntrinsicModsAB so the clamp attribute is emitted,
teaches VOP3P encoding to accept the immediate, and adjusts Clang
codegen/builtin headers plus MLIR op definitions and tests to match
- Documents what the WMMA clamp operand do
- Implement bitcode AutoUpgrade for source compatibility on WMMA IU8
Intrinsic op

Possible future enhancements:
- infer clamping as an optimization fold based on the use context

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-12-27 12:51:29 -05:00
Kevin Per
fc1fd1065b
[AutoUpgrade]: Fixed assertion by considering number of args (#172911)
The assertion was violated because the intrinsic had too many arguments.
In that case, fall back to the default handling.

Closes https://github.com/llvm/llvm-project/issues/172817
2025-12-19 10:02:20 +00:00
Alex MacLean
a40f444265
[NVPTX] Add support for barrier.cta.red.* instructions (#172541)
This change adds full support for the ptx `barrier.cta.red` instruction,
following the same conventions as are already used for
`barrier.cta.sync` and `barrier.cta.arrive`.

In addition this MR removes the following intrinsics which are no longer
needed:
* llvm.nvvm.barrier0.popc -->
  llvm.nvvm.barrier.cta.red.popc.aligned.all(0, c)
* llvm.nvvm.barrier0.and -->
  llvm.nvvm.barrier.cta.red.and.aligned.all(0, z)
* llvm.nvvm.barrier0.or -->
  llvm.nvvm.barrier.cta.red.or.aligned.all(0, z)
2025-12-18 18:06:27 -08:00
Nikita Popov
b7c0452a9a
[PowerPC][AIX] Specify correct ABI alignment for double (#144673)
Add `f64:32:64` to the data layout for AIX, to indicate that doubles
have a 32-bit ABI alignment and 64-bit preferred alignment.

Clang was already taking this into account, but it was not reflected in
LLVM's data layout.

A notable effect of this change is that `double` loads/stores with 4
byte alignment are no longer considered "unaligned" and avoid the
corresponding unaligned access legalization. I assume that this is
correct/desired for AIX. (The codegen previously already relied on this
in some places related to the call ABI simply by dint of assuming
certain stack locations were 8 byte aligned, even though they were only
actually 4 byte aligned.)

Fixes https://github.com/llvm/llvm-project/issues/133599.
2025-12-11 08:57:26 +01:00
anjenner
27651133e2
AMDGPU: Drop and upgrade llvm.amdgcn.atomic.csub/cond.sub to atomicrmw (#105553)
These both perform conditional subtraction, returning the minuend and
zero respectively, if the difference is negative.
2025-12-09 23:13:33 +00:00
BaiXilin
4f79552d25
[x86][AVX-VNNI] Fix VPDPWXXD Argument Types (#169456)
Fixed the argument types of the following intrinsics to match with the
ISA:
 - vpdpwssd_128, vpdpwssd_256, vpdpwssd_512,
 - vpdpwssds_128, vpdpwssds_256, vpdpwssds_512
 - vpdpwsud_128, vpdpwsud_256, vpdowsud_512
 - vpdpwsuds_128, vpdpwsuds_256, vpdpwsuds_512
 - vpdpwusd_128, vpdpwusd_256, vpdpwusd_512
 - vpdpwusds_128, vpdpwusds_256, vpdpwusds_512
 - vpdpwuud_128, vpdpwuud_256, vpdpwuud_512
 - vpdpwuuds_128, vpdpwuuds_256, vpdpwuuds_512

Fixes #97271. Note that this is the last PR for the issue.
2025-12-09 17:10:20 +00:00
Paul Walker
b5a3b8b704
[LLVM][SVE] Remove aarch64.sve.rev intrinsic, using vector.reverse instead. (#169654) 2025-11-28 11:59:34 +00:00
Jakub Kuderski
4c21d0cb14
[ADT] Prepare to deprecate variadic StringSwitch::Cases. NFC. (#166020)
Update all uses of variadic `.Cases` to use the initializer list
overload instead. I plan to mark variadic `.Cases` as deprecated in a
followup PR.

For more context, see https://github.com/llvm/llvm-project/pull/163117.
2025-11-02 00:12:33 +00:00
Alex MacLean
4a383f9ff7
[NVPTX] Add ex2.approx bf16 support and cleanup intrinsic definition (#165446) 2025-11-01 17:51:17 +00:00
Nikita Popov
12bf1836de [AutoUpgrade] Gracefully handle invalid alignment on masked intrinsics
Generate a usage error instead of asserting.
2025-10-22 12:47:26 +02:00
Daniel Kiss
048070ba6f
[ARM][AArch64] BTI,GCS,PAC Module flag update. (#86212)
Module flag is used to indicate the feature to be propagated to the
function. As now the frontend emits all attributes accordingly let's
help the auto upgrade to only do work when old and new bitcodes are
merged.

Depends on #82819 and #86031
2025-10-22 09:29:06 +02:00
Nikita Popov
573ca36753
[IR] Replace alignment argument with attribute on masked intrinsics (#163802)
The `masked.load`, `masked.store`, `masked.gather` and `masked.scatter`
intrinsics currently accept a separate alignment immarg. Replace this
with an `align` attribute on the pointer / vector of pointers argument.

This is the standard representation for alignment information on
intrinsics, and is already used by all other memory intrinsics. This
means the signatures now match llvm.expandload, llvm.vp.load, etc.
(Things like llvm.memcpy used to have a separate alignment argument as
well, but were already migrated a long time ago.)

It's worth noting that the masked.gather and masked.scatter intrinsics
previously accepted a zero alignment to indicate the ABI type alignment
of the element type. This special case is gone now: If the align
attribute is omitted, the implied alignment is 1, as usual. If ABI
alignment is desired, it needs to be explicitly emitted (which the
IRBuilder API already requires anyway).
2025-10-20 08:50:09 +00:00
Joseph Huber
728e925476
[AMDPGU] Auto-upgrade ELF mangling in the data layout (#163644)
Summary:
The changes in https://github.com/llvm/llvm-project/pull/163011 caused
all ELF platforms to default to ELF mangling. We want to auto upgrade
this for linking in new programs to old ones.
2025-10-17 09:00:42 -05:00
BaiXilin
0d9dd60815
[x86][AVX-VNNI] Fix VPDPBXXD Argument Type (#159222)
Fixed intrinsic VPDP[SS,SU,UU]D[,S]_128/256/512's argument types to match with the ISA.
Fixes part of #97271.
2025-09-30 09:41:12 +00:00
Sander de Smalen
17e008db17
[IR] NFC: Remove 'experimental' from partial.reduce.add intrinsic (#158637)
The partial reduction intrinsics are no longer experimental, because
they've been used in production for a while and are unlikely to change.
2025-09-17 11:44:47 +01:00
BaiXilin
94e2c19f86
[x86][AVX-VNNI] Fix VPDPBUSD Argument Types (#155194)
Fixed intrinsic VPDPBUSD[,S]_128/256/512's argument types to match with the ISA.

Fixes part of #97271
2025-09-10 12:24:16 +00:00
Alexandre Ganea
5cda2424c8
[LLD][COFF] Add more --time-trace tags for ThinLTO linking (#156471)
In order to better see what's going on during ThinLTO linking, this PR
adds more profile tags when using `--time-trace` on a `lld-link.exe`
invocation.

After PR, linking `clang.exe`:

<img width="3839" height="2026" alt="Capture d’écran 2025-09-02 082021"
src="https://github.com/user-attachments/assets/bf0c85ba-2f85-4bbf-a5c1-800039b56910"
/>

Linking a custom (Unreal Engine game) binary gives a completly
different picture, probably because of using Unity files, and the sheer
amount of input files (here, providing over 60 GB of .OBJs/.LIBs).

<img width="1940" height="1008" alt="Capture d’écran 2025-09-02 102048"
src="https://github.com/user-attachments/assets/60b28630-7995-45ce-9e8c-13f3cb5312e0"
/>
2025-09-05 15:28:19 -04:00
Alex MacLean
06bcc34e3d
[NVPTX] Auto-upgrade nvvm.grid_constant to param attribute (#155489)
Upgrade the !"grid_constant" !nvvm.annotation to a "nvvm.grid_constant"
attribute. This attribute is much simpler for front-ends to apply and
faster and simpler to query.
2025-08-27 16:32:28 -07:00
Kazu Hirata
07eb7b7692
[llvm] Replace SmallSet with SmallPtrSet (NFC) (#154068)
This patch replaces SmallSet<T *, N> with SmallPtrSet<T *, N>.  Note
that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer
element types:

  template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};

We only have 140 instances that rely on this "redirection", with the
vast majority of them under llvm/. Since relying on the redirection
doesn't improve readability, this patch replaces SmallSet with
SmallPtrSet for pointer element types.
2025-08-18 07:01:29 -07:00
Nikita Popov
02f3e95a42 [AutoUpgrade] Fix use after free
Determine the intrinsic ID before the name is freed during renaming.
2025-08-08 11:54:09 +02:00
Nikita Popov
c23b4fbdbb
[IR] Remove size argument from lifetime intrinsics (#150248)
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.

This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
2025-08-08 11:09:34 +02:00
Meredith Julian
be58069515
[LLVM][NVPTX] Upstream tanh intrinsic for libdevice (#149596)
Currently __nv_fast_tanhf() in libdevice maps to an nvvm intrinsic that
has not been upstreamed, which is causing issues when using the NVPTX
backend from upstream. Instead of upstreaming the intrinsic, we can
instead use the existing Intrinsic::tanh with the afn flag. This change
adds NVPTX backend support for ISD::TANH, adds auto-upgrade for the old
tanh_approx intrinsic to @llvm.tanh.f32 with afn flag so that libdevice
works properly upstream, and adds a basic codegen test and a case to the
auto-upgrade test.
2025-07-24 14:32:59 -07:00
Nikita Popov
92c55a315e
[IR] Only allow lifetime.start/end on allocas (#149310)
lifetime.start and lifetime.end are primarily intended for use on
allocas, to enable stack coloring and other liveness optimizations. This
is necessary because all (static) allocas are hoisted into the entry
block, so lifetime markers are the only way to convey the actual
lifetimes.

However, lifetime.start and lifetime.end are currently *allowed* to be
used on non-alloca pointers. We don't actually do this in practice, but
just the mere fact that this is possible breaks the core purpose of the
lifetime markers, which is stack coloring of allocas. Stack coloring can
only work correctly if all lifetime markers for an alloca are
analyzable.

* If a lifetime marker may operate on multiple allocas via a select/phi,
we don't know which lifetime actually starts/ends and handle it
incorrectly (https://github.com/llvm/llvm-project/issues/104776).
* Stack coloring operates on the assumption that all lifetime markers
are visible, and not, for example, hidden behind a function call or
escaped pointer. It's not possible to change this, as part of the
purpose of lifetime markers is that they work even in the presence of
escaped pointers, where simple use analysis is insufficient.

I don't think there is any way to have coherent semantics for lifetime
markers on allocas, while also permitting them on arbitrary pointer
values.

This PR restricts lifetimes to operate on allocas only. As a followup, I
will also drop the size argument, which is superfluous if we always
operate on an alloca. (This change also renders various code handling
lifetime markers on non-alloca dead. I plan to clean up that kind of
code after dropping the size argument as well.)

In practice, I've only found a few places that currently produce
lifetimes on non-allocas:

* CoroEarly replaces the promise alloca with the result of an intrinsic,
which will later be replaced back with an alloca. I think this is the
only place where there is some legitimate loss of functionality, but I
don't think this is particularly important (I don't think we'd expect
the promise in a coroutine to admit useful lifetime optimization.)
* SafeStack moves unsafe allocas onto a separate frame. We can safely
drop lifetimes here, as SafeStack performs its own stack coloring.
* Similar for AddressSanitizer, it also moves allocas into separate
memory.
* LSR sometimes replaces the lifetime argument with a GEP chain of the
alloca (where the offsets ultimately cancel out). This is just
unnecessary. (Fixed separately in
https://github.com/llvm/llvm-project/pull/149492.)
* InferAddrSpaces sometimes makes lifetimes operate on an addrspacecast
of an alloca. I don't think this is necessary.
2025-07-21 15:04:50 +02:00
David Green
9fcea2e465 [ARM] Add neon vector support for roundeven
As per #142559, this marks froundeven as legal for Neon and upgrades the
existing arm.neon.vrintn intrinsics.
2025-07-04 15:27:33 +01:00
David Green
ec35065789 [ARM] Add neon vector support for rint
As per #142559, this marks frint as legal for Neon and upgrades the existing
arm.neon.vrintx intrinsics.
2025-07-03 21:27:48 +01:00
David Green
1f8f477bd0 [ARM] Add neon vector support for trunc
As per #142559, this marks ftrunc as legal for Neon and upgrades the existing
arm.neon.vrintz intrinsics.
2025-07-03 07:41:13 +01:00
David Green
5332534b9c [ARM] Add neon vector support for ceil
As per #142559, this marks fceil as legal for Neon and upgrades the existing
arm.neon.vrintp intrinsics.
2025-07-01 15:41:10 +01:00