llvm-project

Author	SHA1	Message	Date
Stanislav Mekhanoshin	5f99854d01	[AMDGPU] Drop A and B neg modifier from amdgcn_wmma_bf16_16x16x32_bf16 (#189468 ) Fixes: LCOMPILER-1673	2026-03-30 14:14:22 -07:00
Stanislav Mekhanoshin	a2d84b5d8d	[AMDGPU] Remove neg support from 4 more gfx1250 WMMA (#189115 ) These are previously covered by AMDGPUWmmaIntrinsicModsAllReuse.	2026-03-27 15:20:14 -07:00
Stanislav Mekhanoshin	e69c7312f3	[AMDGPU] Disable neg_lo[0:1] and neg_hi[0:1] on wmma_f32_16x16x32_bf16 (#188649 ) This is the pilot change, the rest will follow the same idea.	2026-03-26 00:37:05 -07:00
Alex MacLean	a9775221ae	[NVPTX] Canonicalize NVVM attribute strings and refactor property queries (NFC) (#187752 )	2026-03-23 08:07:09 -07:00
Akshay Deodhar	8b265cf270	[NVPTX][AutoUpgrade] atom.load intrinsics should be autoupgraded to monotonic atomicrmw for NVPTX (#187140 ) Prior to https://github.com/llvm/llvm-project/pull/179553, the seq_cst qualifier was being ignored. The expected codegen for these intrinsics is `atom.relaxed`- which corresponds to `Monotonic`. The fix does to AutoUpgrade what https://github.com/llvm/llvm-project/pull/185822 does to clang.	2026-03-18 01:26:47 +00:00
Shilei Tian	f05d2e8a39	[AMDGPU] Make uniform-work-group-size a valueless attribute (#183925 ) The "uniform-work-group-size" function attribute previously took a string value of "true" or "false". Since presence alone can convey the "true" semantics and absence can convey "false", the value is unnecessary. This patch converts it to a valueless string attribute: presence indicates true, absence indicates false. For backward compatibility, auto-upgrade logic is added in both UpgradeAttributes (bitcode) and UpgradeFunctionAttributes: if the old value is "true", the attribute is kept without a value; if "false", the attribute is removed.	2026-03-01 21:29:55 +00:00
Shilei Tian	70905e0afa	[RFC][IR] Remove `Constant::isZeroValue` (#181521 ) `Constant::isZeroValue` currently behaves same as `Constant::isNullValue` for all types except floating-point, where it additionally returns true for negative zero (`-0.0`). However, in practice, almost all callers operate on integer/pointer types where the two are equivalent, and the few FP-relevant callers have no meaningful dependence on the `-0.0` behavior. This PR removes `isZeroValue` to eliminate the confusing API. All callers are changed to `isNullValue` with no test failures. `isZeroValue` will be reintroduced in a future change with clearer semantics: when null pointers may have non-zero bit patterns, `isZeroValue` will check for bitwise-all-zeros, while `isNullValue` will check for the semantic null (which may be non-zero).	2026-02-15 12:06:42 -05:00
Sam Elliott	0d08cb0e70	[outliners] Turn nooutline into an Enum Attribute (#163665 ) This change turns the `"nooutline"` attribute into an enum attribute called `nooutline`, and adds an auto-upgrader for bitcode to make the same change to existing IR. This IR attribute disables both the Machine Outliner (enabled at Oz for some targets), and the IR Outliner (disabled by default).	2026-02-10 21:44:17 -08:00
Matt Arsenault	2502e3b7ba	IR: Promote "denormal-fp-math" to a first class attribute (#174293 ) Convert "denormal-fp-math" and "denormal-fp-math-f32" into a first class denormal_fpenv attribute. Previously the query for the effective denormal mode involved two string attribute queries with parsing. I'm introducing more uses of this, so it makes sense to convert this to a more efficient encoding. The old representation was also awkward since it was split across two separate attributes. The new encoding just stores the default and float modes as bitfields, largely avoiding the need to consider if the other mode is set. The syntax in the common cases looks like this: `denormal_fpenv(preservesign,preservesign)` `denormal_fpenv(float: preservesign,preservesign)` `denormal_fpenv(dynamic,dynamic float: preservesign,preservesign)` I wasn't sure about reusing the float type name instead of adding a new keyword. It's parsed as a type but only accepts float. I'm also debating switching the name to subnormal to match the current preferred IEEE terminology (also used by nofpclass and other contexts). This has a behavior change when using the command flag debug options to set the denormal mode. The behavior of the flag ignored functions with an explicit attribute set, per the default and f32 version. Now that these are one attribute, the flag logic can't distinguish which of the two components were explicitly set on the function. Only one test appeared to rely on this behavior, so I just avoided using the flags in it. This also does not perform all the code cleanups this enables. In particular the attributor handling could be cleaned up. I also guessed at how to support this in MLIR. I followed MemoryEffects as a reference; it appears bitfields are expanded into arguments to attributes, so the representation there is a bit uglier with the 2 2-element fields flattened into 4 arguments.	2026-02-05 13:31:26 +00:00
Kshitij Paranjape	32cf905428	[AutoUpgrade] Handle invalid x86 intrinsics (#179374 ) Fixes #176674 Continuation of PR #177606.	2026-02-05 11:17:52 +01:00
Stefan Weigl-Bosker	7a2d46c85b	Revert "[AutoUpgrade] Prevent deletion of call if uses still exist (#177606 )" (#179340 ) This reverts commit 3007e2f050bd36e5e8dab68a5c9abbfbf4561314 (#177606) Buildbot: ``` Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure) ... [9/137] Linking CXX shared module unittests/Passes/Plugins/TestPlugin.so [10/137] Linking CXX executable bin/llvm-config [11/137] Building CXX object lib/IR/CMakeFiles/LLVMCore.dir/AutoUpgrade.cpp.o [12/137] Linking CXX static library lib/libLLVMCore.a [13/137] Generating VCSVersion.inc [14/135] Linking CXX executable bin/apinotes-test [15/135] Linking CXX executable bin/llvm-cxxmap [16/135] Linking CXX executable bin/llvm-bcanalyzer [17/135] Linking CXX executable bin/llvm-ctxprof-util [18/135] Linking CXX executable bin/llvm-objcopy FAILED: bin/llvm-objcopy : && /usr/bin/clang++ -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wno-pass-failed -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -fuse-ld=lld -Wl,--color-diagnostics -Wl,--gc-sections -Xlinker --dependency-file=tools/llvm-objcopy/CMakeFiles/llvm-objcopy.dir/link.d tools/llvm-objcopy/CMakeFiles/llvm-objcopy.dir/ObjcopyOptions.cpp.o tools/llvm-objcopy/CMakeFiles/llvm-objcopy.dir/llvm-objcopy.cpp.o tools/llvm-objcopy/CMakeFiles/llvm-objcopy.dir/llvm-objcopy-driver.cpp.o -o bin/llvm-objcopy -Wl,-rpath,"\$ORIGIN/../lib:" lib/libLLVMObject.a lib/libLLVMObjCopy.a lib/libLLVMOption.a lib/libLLVMSupport.a lib/libLLVMTargetParser.a lib/libLLVMMC.a lib/libLLVMBinaryFormat.a lib/libLLVMIRReader.a lib/libLLVMBitReader.a lib/libLLVMAsmParser.a lib/libLLVMCore.a lib/libLLVMRemarks.a lib/libLLVMBitstreamReader.a lib/libLLVMMCParser.a lib/libLLVMTextAPI.a lib/libLLVMDebugInfoDWARFLowLevel.a -lrt -ldl -lm /usr/lib/aarch64-linux-gnu/libz.so /usr/lib/aarch64-linux-gnu/libzstd.so lib/libLLVMDemangle.a && : ld.lld: error: undefined symbol: llvm::Value::dump() const >>> referenced by AutoUpgrade.cpp >>> AutoUpgrade.cpp.o:(reportFatalUsageErrorWithCI(llvm::StringRef, llvm::CallBase*)) in archive lib/libLLVMCore.a clang++: error: linker command failed with exit code 1 (use -v to see invocation) ```	2026-02-02 17:12:04 -05:00
Kshitij Paranjape	3007e2f050	[AutoUpgrade] Prevent deletion of call if uses still exist (#177606 ) The calls to the llvm.x86.sse2.pshuflw are being deleted due to invalid vector type, even though uses still exist. Adding checks to prevent deletion of call when uses still exist or even if eraseFromParent() is called ensuring it is called after replaceAllUsesWith(). Fixes: #176674	2026-02-02 16:11:13 -05:00
Craig Topper	05e2ee9664	[RISCV] Replace riscv.clmul intrinsic with llvm.clmul (#178092 ) I did not replace riscv.clmulh/clmulr since those require a multiple instruction pattern match. I wanted to ensure that -O0 will select the correct instructions without relying on combines.	2026-01-26 21:12:48 -08:00
Matt Arsenault	0d4a35d560	IR: Remove llvm.convert.to.fp16 and llvm.convert.from.fp16 intrinsics (#174484 ) These are long overdue for removal. These were originally a hack to support loading half values before there was any / decent support for the half type through the backend. There's no reason to continue supporting these, they're equivalent to fpext/fptrunc with a bitcast. SelectionDAG stopped translating these directly, and used the bitcast + fp cast since f7a02c17628e825, so there's been no reason to use these since 2014.	2026-01-21 09:50:28 +00:00
Jonas Paulsson	8eccda10d2	[SystemZ] Add SP alignment to the DataLayout string. (#176041 ) Add '-S64' to the SystemZ datalayout string, to avoid overalignment of stack objects. Fixes #173402	2026-01-20 09:54:47 -06:00
Srinivasa Ravi	13205c51fc	[clang][NVPTX] Add missing half-precision add/mul/fma intrinsics (#170079 ) This change adds the following missing half-precision add/sub/fma intrinsics for the NVPTX target: - `llvm.nvvm.add.rn{.ftz}.sat.f16` - `llvm.nvvm.add.rn{.ftz}.sat.v2f16` - `llvm.nvvm.mul.rn{.ftz}.sat.f16` - `llvm.nvvm.mul.rn{.ftz}.sat.v2f16` - `llvm.nvvm.fma.rn.oob.*` We lower `fneg` followed by one of the above addition intrinsics to the corresponding `sub` instruction. This also removes some incorrect `bf16` fma intrinsics with no valid lowering. PTX spec reference: https://docs.nvidia.com/cuda/parallel-thread-execution/#half-precision-floating-point-instructions	2026-01-20 17:56:55 +05:30
Mikołaj Piróg	d03ce72f40	[IR] Propagate fast-math flags through autoupgraded target intrinsics (#174432 ) Fast-math flags were not copied through upgrades; they are now.	2026-01-15 21:15:14 +01:00
Alex MacLean	bc8fcba3bb	[NVPTX][AutoUpgrade] Use integer min/max intrinsics instead of icmp, select (#173097 )	2026-01-07 12:28:48 -08:00
Shilei Tian	5a63367b15	Reapply "[AMDGPU] Rework the clamp support for WMMA instructions" (#174674 ) (#174697 ) This reverts commit 0b2f3cfb72a76fa90f3ec2a234caabe0d0712590.	2026-01-07 06:12:19 +00:00
dyung	0b2f3cfb72	Revert "[AMDGPU] Rework the clamp support for WMMA instructions" (#174674 ) Reverts llvm/llvm-project#174310 This change is causing 2 cross-project-test failures on https://lab.llvm.org/buildbot/#/builders/174/builds/29695	2026-01-07 01:18:23 +00:00
Shilei Tian	ccca3b8c67	[AMDGPU] Rework the clamp support for WMMA instructions (#174310 ) Fixes #166989.	2026-01-06 15:46:40 -05:00
Luke Lau	ad4bfac732	[IR] Split vector.splice into vector.splice.left and vector.splice.right (#170796 ) This PR implements the first change outlined in https://discourse.llvm.org/t/rfc-allow-non-constant-offsets-in-llvm-vector-splice/88974?u=lukel In order to allow non-immediate offsets in the llvm.vector.splice intrinsic, we need to separate out the "shift left" and "shift right" modes into two separate intrinsics, which were previously determined by whether or not the offset is positive or negative. The description in the LangRef has also been reworded in terms of sliding elements left or right and extracting either the upper or lower half as opposed to extracting from a certain index, which brings it inline with the definition of `llvm.fshr.`/`llvm.fshl.`. This patch teaches AutoUpgrade.cpp to upgrade the old intrinsics into their new equivalent one based on their offset, so existing uses of vector.splice should still work. Uses of llvm.vector.splice in `llvm/test/CodeGen` haven't been replaced in this PR to keep the diff small and kick the tyres on the AutoUpgrader a bit. I planned to do this in a follow up NFC but can include it in this PR if reviewers prefer. Similarly the shuffle costing kind `SK_Splice` has just been kept the same for now, to be split into `SK_SpliceLeft` and `SK_SpliceRight` later.	2026-01-06 15:41:26 +08:00
Shilei Tian	c97de4387b	Revert "[AMDGPU] add clamp immediate operand to WMMA iu8 intrinsic (#171069 )" (#174303 ) This reverts commit 2c376ffeca490a5732e4fd6e98e5351fcf6d692a because it breaks assembler. ``` $ llvm-mc -triple=amdgcn -mcpu=gfx1250 -show-encoding <<< "v_wmma_i32_16x16x64_iu8 v[16:23], v[0:7], v[8:15], v[16:23] matrix_b_reuse" v_wmma_i32_16x16x64_iu8 v[16:23], v[0:7], v[8:15], v[16:23] clamp ; encoding: [0x10,0x80,0x72,0xcc,0x00,0x11,0x42,0x1c] ``` We have a fundamental issue in the clamp support in VOP3P instructions, which will need more changes.	2026-01-04 02:13:21 +00:00
Muhammad Abdul	2c376ffeca	[AMDGPU] add clamp immediate operand to WMMA iu8 intrinsic (#171069 ) Fixes #166989 - Adds a clamp immediate operand to the AMDGPU WMMA iu8 intrinsic and threads it through LLVM IR, MIR lowering, Clang builtins/tests, and MLIR ROCDL dialect so all layers agree on the new operand - Updates AMDGPUWmmaIntrinsicModsAB so the clamp attribute is emitted, teaches VOP3P encoding to accept the immediate, and adjusts Clang codegen/builtin headers plus MLIR op definitions and tests to match - Documents what the WMMA clamp operand do - Implement bitcode AutoUpgrade for source compatibility on WMMA IU8 Intrinsic op Possible future enhancements: - infer clamping as an optimization fold based on the use context --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-12-27 12:51:29 -05:00
Kevin Per	fc1fd1065b	[AutoUpgrade]: Fixed assertion by considering number of args (#172911 ) The assertion was violated because the intrinsic had too many arguments. In that case, fall back to the default handling. Closes https://github.com/llvm/llvm-project/issues/172817	2025-12-19 10:02:20 +00:00
Alex MacLean	a40f444265	[NVPTX] Add support for barrier.cta.red.* instructions (#172541 ) This change adds full support for the ptx `barrier.cta.red` instruction, following the same conventions as are already used for `barrier.cta.sync` and `barrier.cta.arrive`. In addition this MR removes the following intrinsics which are no longer needed: * llvm.nvvm.barrier0.popc --> llvm.nvvm.barrier.cta.red.popc.aligned.all(0, c) * llvm.nvvm.barrier0.and --> llvm.nvvm.barrier.cta.red.and.aligned.all(0, z) * llvm.nvvm.barrier0.or --> llvm.nvvm.barrier.cta.red.or.aligned.all(0, z)	2025-12-18 18:06:27 -08:00
Nikita Popov	b7c0452a9a	[PowerPC][AIX] Specify correct ABI alignment for double (#144673 ) Add `f64:32:64` to the data layout for AIX, to indicate that doubles have a 32-bit ABI alignment and 64-bit preferred alignment. Clang was already taking this into account, but it was not reflected in LLVM's data layout. A notable effect of this change is that `double` loads/stores with 4 byte alignment are no longer considered "unaligned" and avoid the corresponding unaligned access legalization. I assume that this is correct/desired for AIX. (The codegen previously already relied on this in some places related to the call ABI simply by dint of assuming certain stack locations were 8 byte aligned, even though they were only actually 4 byte aligned.) Fixes https://github.com/llvm/llvm-project/issues/133599.	2025-12-11 08:57:26 +01:00
anjenner	27651133e2	AMDGPU: Drop and upgrade llvm.amdgcn.atomic.csub/cond.sub to atomicrmw (#105553 ) These both perform conditional subtraction, returning the minuend and zero respectively, if the difference is negative.	2025-12-09 23:13:33 +00:00
BaiXilin	4f79552d25	[x86][AVX-VNNI] Fix VPDPWXXD Argument Types (#169456 ) Fixed the argument types of the following intrinsics to match with the ISA: - vpdpwssd_128, vpdpwssd_256, vpdpwssd_512, - vpdpwssds_128, vpdpwssds_256, vpdpwssds_512 - vpdpwsud_128, vpdpwsud_256, vpdowsud_512 - vpdpwsuds_128, vpdpwsuds_256, vpdpwsuds_512 - vpdpwusd_128, vpdpwusd_256, vpdpwusd_512 - vpdpwusds_128, vpdpwusds_256, vpdpwusds_512 - vpdpwuud_128, vpdpwuud_256, vpdpwuud_512 - vpdpwuuds_128, vpdpwuuds_256, vpdpwuuds_512 Fixes #97271. Note that this is the last PR for the issue.	2025-12-09 17:10:20 +00:00
Paul Walker	b5a3b8b704	[LLVM][SVE] Remove aarch64.sve.rev intrinsic, using vector.reverse instead. (#169654 )	2025-11-28 11:59:34 +00:00
Jakub Kuderski	4c21d0cb14	[ADT] Prepare to deprecate variadic `StringSwitch::Cases`. NFC. (#166020 ) Update all uses of variadic `.Cases` to use the initializer list overload instead. I plan to mark variadic `.Cases` as deprecated in a followup PR. For more context, see https://github.com/llvm/llvm-project/pull/163117.	2025-11-02 00:12:33 +00:00
Alex MacLean	4a383f9ff7	[NVPTX] Add ex2.approx bf16 support and cleanup intrinsic definition (#165446 )	2025-11-01 17:51:17 +00:00
Nikita Popov	12bf1836de	[AutoUpgrade] Gracefully handle invalid alignment on masked intrinsics Generate a usage error instead of asserting.	2025-10-22 12:47:26 +02:00
Daniel Kiss	048070ba6f	[ARM][AArch64] BTI,GCS,PAC Module flag update. (#86212 ) Module flag is used to indicate the feature to be propagated to the function. As now the frontend emits all attributes accordingly let's help the auto upgrade to only do work when old and new bitcodes are merged. Depends on #82819 and #86031	2025-10-22 09:29:06 +02:00
Nikita Popov	573ca36753	[IR] Replace alignment argument with attribute on masked intrinsics (#163802 ) The `masked.load`, `masked.store`, `masked.gather` and `masked.scatter` intrinsics currently accept a separate alignment immarg. Replace this with an `align` attribute on the pointer / vector of pointers argument. This is the standard representation for alignment information on intrinsics, and is already used by all other memory intrinsics. This means the signatures now match llvm.expandload, llvm.vp.load, etc. (Things like llvm.memcpy used to have a separate alignment argument as well, but were already migrated a long time ago.) It's worth noting that the masked.gather and masked.scatter intrinsics previously accepted a zero alignment to indicate the ABI type alignment of the element type. This special case is gone now: If the align attribute is omitted, the implied alignment is 1, as usual. If ABI alignment is desired, it needs to be explicitly emitted (which the IRBuilder API already requires anyway).	2025-10-20 08:50:09 +00:00
Joseph Huber	728e925476	[AMDPGU] Auto-upgrade ELF mangling in the data layout (#163644 ) Summary: The changes in https://github.com/llvm/llvm-project/pull/163011 caused all ELF platforms to default to ELF mangling. We want to auto upgrade this for linking in new programs to old ones.	2025-10-17 09:00:42 -05:00
BaiXilin	0d9dd60815	[x86][AVX-VNNI] Fix VPDPBXXD Argument Type (#159222 ) Fixed intrinsic VPDP[SS,SU,UU]D[,S]_128/256/512's argument types to match with the ISA. Fixes part of #97271.	2025-09-30 09:41:12 +00:00
Sander de Smalen	17e008db17	[IR] NFC: Remove 'experimental' from partial.reduce.add intrinsic (#158637 ) The partial reduction intrinsics are no longer experimental, because they've been used in production for a while and are unlikely to change.	2025-09-17 11:44:47 +01:00
BaiXilin	94e2c19f86	[x86][AVX-VNNI] Fix VPDPBUSD Argument Types (#155194 ) Fixed intrinsic VPDPBUSD[,S]_128/256/512's argument types to match with the ISA. Fixes part of #97271	2025-09-10 12:24:16 +00:00
Alexandre Ganea	5cda2424c8	[LLD][COFF] Add more `--time-trace` tags for ThinLTO linking (#156471 ) In order to better see what's going on during ThinLTO linking, this PR adds more profile tags when using `--time-trace` on a `lld-link.exe` invocation. After PR, linking `clang.exe`: <img width="3839" height="2026" alt="Capture d’écran 2025-09-02 082021" src="https://github.com/user-attachments/assets/bf0c85ba-2f85-4bbf-a5c1-800039b56910" /> Linking a custom (Unreal Engine game) binary gives a completly different picture, probably because of using Unity files, and the sheer amount of input files (here, providing over 60 GB of .OBJs/.LIBs). <img width="1940" height="1008" alt="Capture d’écran 2025-09-02 102048" src="https://github.com/user-attachments/assets/60b28630-7995-45ce-9e8c-13f3cb5312e0" />	2025-09-05 15:28:19 -04:00
Alex MacLean	06bcc34e3d	[NVPTX] Auto-upgrade nvvm.grid_constant to param attribute (#155489 ) Upgrade the !"grid_constant" !nvvm.annotation to a "nvvm.grid_constant" attribute. This attribute is much simpler for front-ends to apply and faster and simpler to query.	2025-08-27 16:32:28 -07:00
Kazu Hirata	07eb7b7692	[llvm] Replace SmallSet with SmallPtrSet (NFC) (#154068 ) This patch replaces SmallSet<T , N> with SmallPtrSet<T , N>. Note that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer element types: template <typename PointeeType, unsigned N> class SmallSet<PointeeType, N> : public SmallPtrSet<PointeeType, N> {}; We only have 140 instances that rely on this "redirection", with the vast majority of them under llvm/. Since relying on the redirection doesn't improve readability, this patch replaces SmallSet with SmallPtrSet for pointer element types.	2025-08-18 07:01:29 -07:00
Nikita Popov	02f3e95a42	[AutoUpgrade] Fix use after free Determine the intrinsic ID before the name is freed during renaming.	2025-08-08 11:54:09 +02:00
Nikita Popov	c23b4fbdbb	[IR] Remove size argument from lifetime intrinsics (#150248 ) Now that #149310 has restricted lifetime intrinsics to only work on allocas, we can also drop the explicit size argument. Instead, the size is implied by the alloca. This removes the ability to only mark a prefix of an alloca alive/dead. We never used that capability, so we should remove the need to handle that possibility everywhere (though many key places, including stack coloring, did not actually respect this).	2025-08-08 11:09:34 +02:00
Meredith Julian	be58069515	[LLVM][NVPTX] Upstream tanh intrinsic for libdevice (#149596 ) Currently __nv_fast_tanhf() in libdevice maps to an nvvm intrinsic that has not been upstreamed, which is causing issues when using the NVPTX backend from upstream. Instead of upstreaming the intrinsic, we can instead use the existing Intrinsic::tanh with the afn flag. This change adds NVPTX backend support for ISD::TANH, adds auto-upgrade for the old tanh_approx intrinsic to @llvm.tanh.f32 with afn flag so that libdevice works properly upstream, and adds a basic codegen test and a case to the auto-upgrade test.	2025-07-24 14:32:59 -07:00
Nikita Popov	92c55a315e	[IR] Only allow lifetime.start/end on allocas (#149310 ) lifetime.start and lifetime.end are primarily intended for use on allocas, to enable stack coloring and other liveness optimizations. This is necessary because all (static) allocas are hoisted into the entry block, so lifetime markers are the only way to convey the actual lifetimes. However, lifetime.start and lifetime.end are currently allowed to be used on non-alloca pointers. We don't actually do this in practice, but just the mere fact that this is possible breaks the core purpose of the lifetime markers, which is stack coloring of allocas. Stack coloring can only work correctly if all lifetime markers for an alloca are analyzable. * If a lifetime marker may operate on multiple allocas via a select/phi, we don't know which lifetime actually starts/ends and handle it incorrectly (https://github.com/llvm/llvm-project/issues/104776). * Stack coloring operates on the assumption that all lifetime markers are visible, and not, for example, hidden behind a function call or escaped pointer. It's not possible to change this, as part of the purpose of lifetime markers is that they work even in the presence of escaped pointers, where simple use analysis is insufficient. I don't think there is any way to have coherent semantics for lifetime markers on allocas, while also permitting them on arbitrary pointer values. This PR restricts lifetimes to operate on allocas only. As a followup, I will also drop the size argument, which is superfluous if we always operate on an alloca. (This change also renders various code handling lifetime markers on non-alloca dead. I plan to clean up that kind of code after dropping the size argument as well.) In practice, I've only found a few places that currently produce lifetimes on non-allocas: * CoroEarly replaces the promise alloca with the result of an intrinsic, which will later be replaced back with an alloca. I think this is the only place where there is some legitimate loss of functionality, but I don't think this is particularly important (I don't think we'd expect the promise in a coroutine to admit useful lifetime optimization.) * SafeStack moves unsafe allocas onto a separate frame. We can safely drop lifetimes here, as SafeStack performs its own stack coloring. * Similar for AddressSanitizer, it also moves allocas into separate memory. * LSR sometimes replaces the lifetime argument with a GEP chain of the alloca (where the offsets ultimately cancel out). This is just unnecessary. (Fixed separately in https://github.com/llvm/llvm-project/pull/149492.) * InferAddrSpaces sometimes makes lifetimes operate on an addrspacecast of an alloca. I don't think this is necessary.	2025-07-21 15:04:50 +02:00
David Green	9fcea2e465	[ARM] Add neon vector support for roundeven As per #142559, this marks froundeven as legal for Neon and upgrades the existing arm.neon.vrintn intrinsics.	2025-07-04 15:27:33 +01:00
David Green	ec35065789	[ARM] Add neon vector support for rint As per #142559, this marks frint as legal for Neon and upgrades the existing arm.neon.vrintx intrinsics.	2025-07-03 21:27:48 +01:00
David Green	1f8f477bd0	[ARM] Add neon vector support for trunc As per #142559, this marks ftrunc as legal for Neon and upgrades the existing arm.neon.vrintz intrinsics.	2025-07-03 07:41:13 +01:00
David Green	5332534b9c	[ARM] Add neon vector support for ceil As per #142559, this marks fceil as legal for Neon and upgrades the existing arm.neon.vrintp intrinsics.	2025-07-01 15:41:10 +01:00

1 2 3 4 5 ...

607 Commits