llvm-project

Author	SHA1	Message	Date
Matt Arsenault	2502e3b7ba	IR: Promote "denormal-fp-math" to a first class attribute (#174293 ) Convert "denormal-fp-math" and "denormal-fp-math-f32" into a first class denormal_fpenv attribute. Previously the query for the effective denormal mode involved two string attribute queries with parsing. I'm introducing more uses of this, so it makes sense to convert this to a more efficient encoding. The old representation was also awkward since it was split across two separate attributes. The new encoding just stores the default and float modes as bitfields, largely avoiding the need to consider if the other mode is set. The syntax in the common cases looks like this: `denormal_fpenv(preservesign,preservesign)` `denormal_fpenv(float: preservesign,preservesign)` `denormal_fpenv(dynamic,dynamic float: preservesign,preservesign)` I wasn't sure about reusing the float type name instead of adding a new keyword. It's parsed as a type but only accepts float. I'm also debating switching the name to subnormal to match the current preferred IEEE terminology (also used by nofpclass and other contexts). This has a behavior change when using the command flag debug options to set the denormal mode. The behavior of the flag ignored functions with an explicit attribute set, per the default and f32 version. Now that these are one attribute, the flag logic can't distinguish which of the two components were explicitly set on the function. Only one test appeared to rely on this behavior, so I just avoided using the flags in it. This also does not perform all the code cleanups this enables. In particular the attributor handling could be cleaned up. I also guessed at how to support this in MLIR. I followed MemoryEffects as a reference; it appears bitfields are expanded into arguments to attributes, so the representation there is a bit uglier with the 2 2-element fields flattened into 4 arguments.	2026-02-05 13:31:26 +00:00
Kshitij Paranjape	32cf905428	[AutoUpgrade] Handle invalid x86 intrinsics (#179374 ) Fixes #176674 Continuation of PR #177606.	2026-02-05 11:17:52 +01:00
Stefan Weigl-Bosker	7a2d46c85b	Revert "[AutoUpgrade] Prevent deletion of call if uses still exist (#177606 )" (#179340 ) This reverts commit 3007e2f050bd36e5e8dab68a5c9abbfbf4561314 (#177606) Buildbot: ``` Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure) ... [9/137] Linking CXX shared module unittests/Passes/Plugins/TestPlugin.so [10/137] Linking CXX executable bin/llvm-config [11/137] Building CXX object lib/IR/CMakeFiles/LLVMCore.dir/AutoUpgrade.cpp.o [12/137] Linking CXX static library lib/libLLVMCore.a [13/137] Generating VCSVersion.inc [14/135] Linking CXX executable bin/apinotes-test [15/135] Linking CXX executable bin/llvm-cxxmap [16/135] Linking CXX executable bin/llvm-bcanalyzer [17/135] Linking CXX executable bin/llvm-ctxprof-util [18/135] Linking CXX executable bin/llvm-objcopy FAILED: bin/llvm-objcopy : && /usr/bin/clang++ -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wno-pass-failed -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -fuse-ld=lld -Wl,--color-diagnostics -Wl,--gc-sections -Xlinker --dependency-file=tools/llvm-objcopy/CMakeFiles/llvm-objcopy.dir/link.d tools/llvm-objcopy/CMakeFiles/llvm-objcopy.dir/ObjcopyOptions.cpp.o tools/llvm-objcopy/CMakeFiles/llvm-objcopy.dir/llvm-objcopy.cpp.o tools/llvm-objcopy/CMakeFiles/llvm-objcopy.dir/llvm-objcopy-driver.cpp.o -o bin/llvm-objcopy -Wl,-rpath,"\$ORIGIN/../lib:" lib/libLLVMObject.a lib/libLLVMObjCopy.a lib/libLLVMOption.a lib/libLLVMSupport.a lib/libLLVMTargetParser.a lib/libLLVMMC.a lib/libLLVMBinaryFormat.a lib/libLLVMIRReader.a lib/libLLVMBitReader.a lib/libLLVMAsmParser.a lib/libLLVMCore.a lib/libLLVMRemarks.a lib/libLLVMBitstreamReader.a lib/libLLVMMCParser.a lib/libLLVMTextAPI.a lib/libLLVMDebugInfoDWARFLowLevel.a -lrt -ldl -lm /usr/lib/aarch64-linux-gnu/libz.so /usr/lib/aarch64-linux-gnu/libzstd.so lib/libLLVMDemangle.a && : ld.lld: error: undefined symbol: llvm::Value::dump() const >>> referenced by AutoUpgrade.cpp >>> AutoUpgrade.cpp.o:(reportFatalUsageErrorWithCI(llvm::StringRef, llvm::CallBase*)) in archive lib/libLLVMCore.a clang++: error: linker command failed with exit code 1 (use -v to see invocation) ```	2026-02-02 17:12:04 -05:00
Kshitij Paranjape	3007e2f050	[AutoUpgrade] Prevent deletion of call if uses still exist (#177606 ) The calls to the llvm.x86.sse2.pshuflw are being deleted due to invalid vector type, even though uses still exist. Adding checks to prevent deletion of call when uses still exist or even if eraseFromParent() is called ensuring it is called after replaceAllUsesWith(). Fixes: #176674	2026-02-02 16:11:13 -05:00
Craig Topper	05e2ee9664	[RISCV] Replace riscv.clmul intrinsic with llvm.clmul (#178092 ) I did not replace riscv.clmulh/clmulr since those require a multiple instruction pattern match. I wanted to ensure that -O0 will select the correct instructions without relying on combines.	2026-01-26 21:12:48 -08:00
Matt Arsenault	0d4a35d560	IR: Remove llvm.convert.to.fp16 and llvm.convert.from.fp16 intrinsics (#174484 ) These are long overdue for removal. These were originally a hack to support loading half values before there was any / decent support for the half type through the backend. There's no reason to continue supporting these, they're equivalent to fpext/fptrunc with a bitcast. SelectionDAG stopped translating these directly, and used the bitcast + fp cast since f7a02c17628e825, so there's been no reason to use these since 2014.	2026-01-21 09:50:28 +00:00
Jonas Paulsson	8eccda10d2	[SystemZ] Add SP alignment to the DataLayout string. (#176041 ) Add '-S64' to the SystemZ datalayout string, to avoid overalignment of stack objects. Fixes #173402	2026-01-20 09:54:47 -06:00
Srinivasa Ravi	13205c51fc	[clang][NVPTX] Add missing half-precision add/mul/fma intrinsics (#170079 ) This change adds the following missing half-precision add/sub/fma intrinsics for the NVPTX target: - `llvm.nvvm.add.rn{.ftz}.sat.f16` - `llvm.nvvm.add.rn{.ftz}.sat.v2f16` - `llvm.nvvm.mul.rn{.ftz}.sat.f16` - `llvm.nvvm.mul.rn{.ftz}.sat.v2f16` - `llvm.nvvm.fma.rn.oob.*` We lower `fneg` followed by one of the above addition intrinsics to the corresponding `sub` instruction. This also removes some incorrect `bf16` fma intrinsics with no valid lowering. PTX spec reference: https://docs.nvidia.com/cuda/parallel-thread-execution/#half-precision-floating-point-instructions	2026-01-20 17:56:55 +05:30
Mikołaj Piróg	d03ce72f40	[IR] Propagate fast-math flags through autoupgraded target intrinsics (#174432 ) Fast-math flags were not copied through upgrades; they are now.	2026-01-15 21:15:14 +01:00
Alex MacLean	bc8fcba3bb	[NVPTX][AutoUpgrade] Use integer min/max intrinsics instead of icmp, select (#173097 )	2026-01-07 12:28:48 -08:00
Shilei Tian	5a63367b15	Reapply "[AMDGPU] Rework the clamp support for WMMA instructions" (#174674 ) (#174697 ) This reverts commit 0b2f3cfb72a76fa90f3ec2a234caabe0d0712590.	2026-01-07 06:12:19 +00:00
dyung	0b2f3cfb72	Revert "[AMDGPU] Rework the clamp support for WMMA instructions" (#174674 ) Reverts llvm/llvm-project#174310 This change is causing 2 cross-project-test failures on https://lab.llvm.org/buildbot/#/builders/174/builds/29695	2026-01-07 01:18:23 +00:00
Shilei Tian	ccca3b8c67	[AMDGPU] Rework the clamp support for WMMA instructions (#174310 ) Fixes #166989.	2026-01-06 15:46:40 -05:00
Luke Lau	ad4bfac732	[IR] Split vector.splice into vector.splice.left and vector.splice.right (#170796 ) This PR implements the first change outlined in https://discourse.llvm.org/t/rfc-allow-non-constant-offsets-in-llvm-vector-splice/88974?u=lukel In order to allow non-immediate offsets in the llvm.vector.splice intrinsic, we need to separate out the "shift left" and "shift right" modes into two separate intrinsics, which were previously determined by whether or not the offset is positive or negative. The description in the LangRef has also been reworded in terms of sliding elements left or right and extracting either the upper or lower half as opposed to extracting from a certain index, which brings it inline with the definition of `llvm.fshr.`/`llvm.fshl.`. This patch teaches AutoUpgrade.cpp to upgrade the old intrinsics into their new equivalent one based on their offset, so existing uses of vector.splice should still work. Uses of llvm.vector.splice in `llvm/test/CodeGen` haven't been replaced in this PR to keep the diff small and kick the tyres on the AutoUpgrader a bit. I planned to do this in a follow up NFC but can include it in this PR if reviewers prefer. Similarly the shuffle costing kind `SK_Splice` has just been kept the same for now, to be split into `SK_SpliceLeft` and `SK_SpliceRight` later.	2026-01-06 15:41:26 +08:00
Shilei Tian	c97de4387b	Revert "[AMDGPU] add clamp immediate operand to WMMA iu8 intrinsic (#171069 )" (#174303 ) This reverts commit 2c376ffeca490a5732e4fd6e98e5351fcf6d692a because it breaks assembler. ``` $ llvm-mc -triple=amdgcn -mcpu=gfx1250 -show-encoding <<< "v_wmma_i32_16x16x64_iu8 v[16:23], v[0:7], v[8:15], v[16:23] matrix_b_reuse" v_wmma_i32_16x16x64_iu8 v[16:23], v[0:7], v[8:15], v[16:23] clamp ; encoding: [0x10,0x80,0x72,0xcc,0x00,0x11,0x42,0x1c] ``` We have a fundamental issue in the clamp support in VOP3P instructions, which will need more changes.	2026-01-04 02:13:21 +00:00
Muhammad Abdul	2c376ffeca	[AMDGPU] add clamp immediate operand to WMMA iu8 intrinsic (#171069 ) Fixes #166989 - Adds a clamp immediate operand to the AMDGPU WMMA iu8 intrinsic and threads it through LLVM IR, MIR lowering, Clang builtins/tests, and MLIR ROCDL dialect so all layers agree on the new operand - Updates AMDGPUWmmaIntrinsicModsAB so the clamp attribute is emitted, teaches VOP3P encoding to accept the immediate, and adjusts Clang codegen/builtin headers plus MLIR op definitions and tests to match - Documents what the WMMA clamp operand do - Implement bitcode AutoUpgrade for source compatibility on WMMA IU8 Intrinsic op Possible future enhancements: - infer clamping as an optimization fold based on the use context --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-12-27 12:51:29 -05:00
Kevin Per	fc1fd1065b	[AutoUpgrade]: Fixed assertion by considering number of args (#172911 ) The assertion was violated because the intrinsic had too many arguments. In that case, fall back to the default handling. Closes https://github.com/llvm/llvm-project/issues/172817	2025-12-19 10:02:20 +00:00
Alex MacLean	a40f444265	[NVPTX] Add support for barrier.cta.red.* instructions (#172541 ) This change adds full support for the ptx `barrier.cta.red` instruction, following the same conventions as are already used for `barrier.cta.sync` and `barrier.cta.arrive`. In addition this MR removes the following intrinsics which are no longer needed: * llvm.nvvm.barrier0.popc --> llvm.nvvm.barrier.cta.red.popc.aligned.all(0, c) * llvm.nvvm.barrier0.and --> llvm.nvvm.barrier.cta.red.and.aligned.all(0, z) * llvm.nvvm.barrier0.or --> llvm.nvvm.barrier.cta.red.or.aligned.all(0, z)	2025-12-18 18:06:27 -08:00
Nikita Popov	b7c0452a9a	[PowerPC][AIX] Specify correct ABI alignment for double (#144673 ) Add `f64:32:64` to the data layout for AIX, to indicate that doubles have a 32-bit ABI alignment and 64-bit preferred alignment. Clang was already taking this into account, but it was not reflected in LLVM's data layout. A notable effect of this change is that `double` loads/stores with 4 byte alignment are no longer considered "unaligned" and avoid the corresponding unaligned access legalization. I assume that this is correct/desired for AIX. (The codegen previously already relied on this in some places related to the call ABI simply by dint of assuming certain stack locations were 8 byte aligned, even though they were only actually 4 byte aligned.) Fixes https://github.com/llvm/llvm-project/issues/133599.	2025-12-11 08:57:26 +01:00
anjenner	27651133e2	AMDGPU: Drop and upgrade llvm.amdgcn.atomic.csub/cond.sub to atomicrmw (#105553 ) These both perform conditional subtraction, returning the minuend and zero respectively, if the difference is negative.	2025-12-09 23:13:33 +00:00
BaiXilin	4f79552d25	[x86][AVX-VNNI] Fix VPDPWXXD Argument Types (#169456 ) Fixed the argument types of the following intrinsics to match with the ISA: - vpdpwssd_128, vpdpwssd_256, vpdpwssd_512, - vpdpwssds_128, vpdpwssds_256, vpdpwssds_512 - vpdpwsud_128, vpdpwsud_256, vpdowsud_512 - vpdpwsuds_128, vpdpwsuds_256, vpdpwsuds_512 - vpdpwusd_128, vpdpwusd_256, vpdpwusd_512 - vpdpwusds_128, vpdpwusds_256, vpdpwusds_512 - vpdpwuud_128, vpdpwuud_256, vpdpwuud_512 - vpdpwuuds_128, vpdpwuuds_256, vpdpwuuds_512 Fixes #97271. Note that this is the last PR for the issue.	2025-12-09 17:10:20 +00:00
Paul Walker	b5a3b8b704	[LLVM][SVE] Remove aarch64.sve.rev intrinsic, using vector.reverse instead. (#169654 )	2025-11-28 11:59:34 +00:00
Jakub Kuderski	4c21d0cb14	[ADT] Prepare to deprecate variadic `StringSwitch::Cases`. NFC. (#166020 ) Update all uses of variadic `.Cases` to use the initializer list overload instead. I plan to mark variadic `.Cases` as deprecated in a followup PR. For more context, see https://github.com/llvm/llvm-project/pull/163117.	2025-11-02 00:12:33 +00:00
Alex MacLean	4a383f9ff7	[NVPTX] Add ex2.approx bf16 support and cleanup intrinsic definition (#165446 )	2025-11-01 17:51:17 +00:00
Nikita Popov	12bf1836de	[AutoUpgrade] Gracefully handle invalid alignment on masked intrinsics Generate a usage error instead of asserting.	2025-10-22 12:47:26 +02:00
Daniel Kiss	048070ba6f	[ARM][AArch64] BTI,GCS,PAC Module flag update. (#86212 ) Module flag is used to indicate the feature to be propagated to the function. As now the frontend emits all attributes accordingly let's help the auto upgrade to only do work when old and new bitcodes are merged. Depends on #82819 and #86031	2025-10-22 09:29:06 +02:00
Nikita Popov	573ca36753	[IR] Replace alignment argument with attribute on masked intrinsics (#163802 ) The `masked.load`, `masked.store`, `masked.gather` and `masked.scatter` intrinsics currently accept a separate alignment immarg. Replace this with an `align` attribute on the pointer / vector of pointers argument. This is the standard representation for alignment information on intrinsics, and is already used by all other memory intrinsics. This means the signatures now match llvm.expandload, llvm.vp.load, etc. (Things like llvm.memcpy used to have a separate alignment argument as well, but were already migrated a long time ago.) It's worth noting that the masked.gather and masked.scatter intrinsics previously accepted a zero alignment to indicate the ABI type alignment of the element type. This special case is gone now: If the align attribute is omitted, the implied alignment is 1, as usual. If ABI alignment is desired, it needs to be explicitly emitted (which the IRBuilder API already requires anyway).	2025-10-20 08:50:09 +00:00
Joseph Huber	728e925476	[AMDPGU] Auto-upgrade ELF mangling in the data layout (#163644 ) Summary: The changes in https://github.com/llvm/llvm-project/pull/163011 caused all ELF platforms to default to ELF mangling. We want to auto upgrade this for linking in new programs to old ones.	2025-10-17 09:00:42 -05:00
BaiXilin	0d9dd60815	[x86][AVX-VNNI] Fix VPDPBXXD Argument Type (#159222 ) Fixed intrinsic VPDP[SS,SU,UU]D[,S]_128/256/512's argument types to match with the ISA. Fixes part of #97271.	2025-09-30 09:41:12 +00:00
Sander de Smalen	17e008db17	[IR] NFC: Remove 'experimental' from partial.reduce.add intrinsic (#158637 ) The partial reduction intrinsics are no longer experimental, because they've been used in production for a while and are unlikely to change.	2025-09-17 11:44:47 +01:00
BaiXilin	94e2c19f86	[x86][AVX-VNNI] Fix VPDPBUSD Argument Types (#155194 ) Fixed intrinsic VPDPBUSD[,S]_128/256/512's argument types to match with the ISA. Fixes part of #97271	2025-09-10 12:24:16 +00:00
Alexandre Ganea	5cda2424c8	[LLD][COFF] Add more `--time-trace` tags for ThinLTO linking (#156471 ) In order to better see what's going on during ThinLTO linking, this PR adds more profile tags when using `--time-trace` on a `lld-link.exe` invocation. After PR, linking `clang.exe`: <img width="3839" height="2026" alt="Capture d’écran 2025-09-02 082021" src="https://github.com/user-attachments/assets/bf0c85ba-2f85-4bbf-a5c1-800039b56910" /> Linking a custom (Unreal Engine game) binary gives a completly different picture, probably because of using Unity files, and the sheer amount of input files (here, providing over 60 GB of .OBJs/.LIBs). <img width="1940" height="1008" alt="Capture d’écran 2025-09-02 102048" src="https://github.com/user-attachments/assets/60b28630-7995-45ce-9e8c-13f3cb5312e0" />	2025-09-05 15:28:19 -04:00
Alex MacLean	06bcc34e3d	[NVPTX] Auto-upgrade nvvm.grid_constant to param attribute (#155489 ) Upgrade the !"grid_constant" !nvvm.annotation to a "nvvm.grid_constant" attribute. This attribute is much simpler for front-ends to apply and faster and simpler to query.	2025-08-27 16:32:28 -07:00
Kazu Hirata	07eb7b7692	[llvm] Replace SmallSet with SmallPtrSet (NFC) (#154068 ) This patch replaces SmallSet<T , N> with SmallPtrSet<T , N>. Note that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer element types: template <typename PointeeType, unsigned N> class SmallSet<PointeeType, N> : public SmallPtrSet<PointeeType, N> {}; We only have 140 instances that rely on this "redirection", with the vast majority of them under llvm/. Since relying on the redirection doesn't improve readability, this patch replaces SmallSet with SmallPtrSet for pointer element types.	2025-08-18 07:01:29 -07:00
Nikita Popov	02f3e95a42	[AutoUpgrade] Fix use after free Determine the intrinsic ID before the name is freed during renaming.	2025-08-08 11:54:09 +02:00
Nikita Popov	c23b4fbdbb	[IR] Remove size argument from lifetime intrinsics (#150248 ) Now that #149310 has restricted lifetime intrinsics to only work on allocas, we can also drop the explicit size argument. Instead, the size is implied by the alloca. This removes the ability to only mark a prefix of an alloca alive/dead. We never used that capability, so we should remove the need to handle that possibility everywhere (though many key places, including stack coloring, did not actually respect this).	2025-08-08 11:09:34 +02:00
Meredith Julian	be58069515	[LLVM][NVPTX] Upstream tanh intrinsic for libdevice (#149596 ) Currently __nv_fast_tanhf() in libdevice maps to an nvvm intrinsic that has not been upstreamed, which is causing issues when using the NVPTX backend from upstream. Instead of upstreaming the intrinsic, we can instead use the existing Intrinsic::tanh with the afn flag. This change adds NVPTX backend support for ISD::TANH, adds auto-upgrade for the old tanh_approx intrinsic to @llvm.tanh.f32 with afn flag so that libdevice works properly upstream, and adds a basic codegen test and a case to the auto-upgrade test.	2025-07-24 14:32:59 -07:00
Nikita Popov	92c55a315e	[IR] Only allow lifetime.start/end on allocas (#149310 ) lifetime.start and lifetime.end are primarily intended for use on allocas, to enable stack coloring and other liveness optimizations. This is necessary because all (static) allocas are hoisted into the entry block, so lifetime markers are the only way to convey the actual lifetimes. However, lifetime.start and lifetime.end are currently allowed to be used on non-alloca pointers. We don't actually do this in practice, but just the mere fact that this is possible breaks the core purpose of the lifetime markers, which is stack coloring of allocas. Stack coloring can only work correctly if all lifetime markers for an alloca are analyzable. * If a lifetime marker may operate on multiple allocas via a select/phi, we don't know which lifetime actually starts/ends and handle it incorrectly (https://github.com/llvm/llvm-project/issues/104776). * Stack coloring operates on the assumption that all lifetime markers are visible, and not, for example, hidden behind a function call or escaped pointer. It's not possible to change this, as part of the purpose of lifetime markers is that they work even in the presence of escaped pointers, where simple use analysis is insufficient. I don't think there is any way to have coherent semantics for lifetime markers on allocas, while also permitting them on arbitrary pointer values. This PR restricts lifetimes to operate on allocas only. As a followup, I will also drop the size argument, which is superfluous if we always operate on an alloca. (This change also renders various code handling lifetime markers on non-alloca dead. I plan to clean up that kind of code after dropping the size argument as well.) In practice, I've only found a few places that currently produce lifetimes on non-allocas: * CoroEarly replaces the promise alloca with the result of an intrinsic, which will later be replaced back with an alloca. I think this is the only place where there is some legitimate loss of functionality, but I don't think this is particularly important (I don't think we'd expect the promise in a coroutine to admit useful lifetime optimization.) * SafeStack moves unsafe allocas onto a separate frame. We can safely drop lifetimes here, as SafeStack performs its own stack coloring. * Similar for AddressSanitizer, it also moves allocas into separate memory. * LSR sometimes replaces the lifetime argument with a GEP chain of the alloca (where the offsets ultimately cancel out). This is just unnecessary. (Fixed separately in https://github.com/llvm/llvm-project/pull/149492.) * InferAddrSpaces sometimes makes lifetimes operate on an addrspacecast of an alloca. I don't think this is necessary.	2025-07-21 15:04:50 +02:00
David Green	9fcea2e465	[ARM] Add neon vector support for roundeven As per #142559, this marks froundeven as legal for Neon and upgrades the existing arm.neon.vrintn intrinsics.	2025-07-04 15:27:33 +01:00
David Green	ec35065789	[ARM] Add neon vector support for rint As per #142559, this marks frint as legal for Neon and upgrades the existing arm.neon.vrintx intrinsics.	2025-07-03 21:27:48 +01:00
David Green	1f8f477bd0	[ARM] Add neon vector support for trunc As per #142559, this marks ftrunc as legal for Neon and upgrades the existing arm.neon.vrintz intrinsics.	2025-07-03 07:41:13 +01:00
David Green	5332534b9c	[ARM] Add neon vector support for ceil As per #142559, this marks fceil as legal for Neon and upgrades the existing arm.neon.vrintp intrinsics.	2025-07-01 15:41:10 +01:00
David Green	6bd9ff04af	[ARM] Add neon vector support for round As per #142559, this marks fround as legal for Neon and upgrades the existing arm.neon.vrinta intrinsics.	2025-06-30 17:15:26 +01:00
David Green	dcc9e36b18	[ARM] Add neon vector support for floor (#142559 ) This marks ffloor as legal providing that armv8 and neon is present (or fullfp16 for the fp16 instructions). The existing arm_neon_vrintm intrinsics are auto-upgraded to llvm.floor. If this is OK I will update the other vrint intrinsics.	2025-06-29 11:37:16 +01:00
Nikita Popov	9a6a87da6e	[AutoUpgrade] Remove unnecessary name check (NFCI) If only the name is incorrect (due to added overload), but the signature is correct, we should go through the generic remangling upgrade.	2025-06-23 14:56:24 +02:00
Durgadoss R	3e5d50f9c6	[NVPTX] Add cta_group support to TMA G2S intrinsics (#143178 ) This patch extends the TMA G2S intrinsics with the support for cta_group::1/2 available from Blackwell onwards. The existing intrinsics are auto-upgraded with a default value of '0' for the `cta_group` flag operand. * lit tests are added for all combinations of the newer variants. * Negative tests are added to validate the error-handling when the value of the cta_group flag falls out-of-range. * The generated PTX is verified with a 12.8 ptxas executable. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>	2025-06-12 15:20:39 +05:30
Jeremy Morse	459475020a	Reapply 76197ea6f91f after removing an assertion Specifically this is the assertion in BasicBlock.cpp. Now that we're not examining or setting that flag consistently (because it'll be deleted in about an hour) there's no need to keep this assertion. Original commit title: [DebugInfo][RemoveDIs] Remove some debug intrinsic-only codepaths (#143451)	2025-06-11 17:35:29 +01:00
Jeremy Morse	76197ea6f9	Revert "[DebugInfo][RemoveDIs] Remove some debug intrinsic-only codepaths (#143451 )" This reverts commit c71a2e688828ab3ede4fb54168a674ff68396f61. /me squints -- this is hitting an assertion I thought had been deleted, will revert and investigate for a bit.	2025-06-11 14:52:17 +01:00
Jeremy Morse	c71a2e6888	[DebugInfo][RemoveDIs] Remove some debug intrinsic-only codepaths (#143451 ) These are opportunistic deletions as more places that make use of the IsNewDbgInfoFormat flag are removed. It should (TM)(R) all be dead code now that `IsNewDbgInfoFormat` should be true everywhere. FastISel: we don't need to do debug-aware instruction counting any more, because there are no debug instructions, Autoupgrade: you can no-longer avoid autoupgrading of intrinsics to records DIBuilder: Delete the code for creating debug intrinsics (!) LoopUtils: No need to handle debug instructions, they don't exist	2025-06-11 14:43:15 +01:00
Jeremy Morse	3d7aa961ac	[DebugInfo][RemoveDIs] Use autoupgrader to convert old debug-info (#143452 ) By chance, two things have prevented the autoupgrade path being exercised much so far: * LLParser setting the debug-info mode to "old" on seeing intrinsics, * The test in AutoUpgrade.cpp wanting to upgrade into a "new" debug-info block. In practice, this appears to mean this code path hasn't seen the various invalid inputs that can come its way. This commit does a number of things: * Tolerates the various illegal inputs that can be written with debug-intrinsics, and that must be tolerated until the Verifier runs, * Printing illegal/null DbgRecord fields must succeed, * Verifier errors need to localise the function/block where the error is, * Tests that now see debug records will print debug-record errors, Plus a few new tests for other intrinsic-to-debug-record failures modes I found. There are also two edge cases: * Some of the unit tests switch back and forth between intrinsic and record modes at will; I've deleted coverage and some assertions to tolerate this as intrinsic support is now Gone (TM), * In sroa-extract-bits.ll, the order of debug records flips. This is because the autoupgrader upgrades in the opposite order to the basic block conversion routines... which doesn't change the record order, but _does_ change the use list order in Metadata! This should (TM) have no consequence to the correctness of LLVM, but will change the order of various records and the order of DWARF record output too. I tried to reduce this patch to a smaller collection of changes, but they're all intertwined, sorry.	2025-06-11 13:56:30 +01:00

1 2 3 4 5 ...

599 Commits