llvm-project

Author	SHA1	Message	Date
Akshay Deodhar	fab5b1858d	Reland "[NVPTX][AtomicExpandPass] Complete support for AtomicRMW in NVPTX (#176015 )" (#179553 ) This PR adds full support for atomicrmw in NVPTX. This includes: - Memory order and syncscope support (changes in AtomicExpandPass.cpp, NVPTXIntrinsics.td) - Script-generated tests for integer and atomic operations (atomicrmw.py, atomicrmw-sm.ll in tests/CodeGen/NVPTX). Existing atomics tests which are subsumed by these have been removed (atomics-sm.ll, atomics.ll, atomicrmw-expand.ll). - ~~Changes shouldExpandAtomicRMWInIR to take a constant argument: This is to allow some other TargetLowering constant-argument functions to call it. This change touches several backends. An alternative solution exists, but to me, this seems the "right" way.~~ Has been split out into https://github.com/llvm/llvm-project/pull/176073. Rebased. - NOTE: The initial load issued for atomicrmw emulation loops (and cmpxchg emulation loops) must be a strong load. Currently, AtomicExpandPass issues a weak load. Fixing this breaks several backends. I'm planning to follow up with a separate PR. Initially failed due to error: ptxas fatal : Value 'sm_60' is not defined for option 'gpu-name'. Updated RUN lines in atomicrmw-sm*.py to skip the ptxas-verify check if ptxas does not support that SM version.	2026-02-04 16:15:49 -08:00
Akshay Deodhar	8a62457c10	Revert "[NVPTX][AtomicExpandPass] Complete support for AtomicRMW in NVPTX (#176015 )" (#178329 ) This reverts commit 1d379d05d46d77b5f008349cc14de27dd055f4b9. This change breaks llvm-nvptx-nvidia-win in the buildbot.	2026-01-28 01:31:06 +00:00
Akshay Deodhar	1d379d05d4	[NVPTX][AtomicExpandPass] Complete support for AtomicRMW in NVPTX (#176015 ) This PR adds full support for atomicrmw in NVPTX. This includes: - Memory order and syncscope support (changes in AtomicExpandPass.cpp, NVPTXIntrinsics.td) - Script-generated tests for integer and atomic operations (atomicrmw.py, atomicrmw-sm.ll in tests/CodeGen/NVPTX). Existing atomics tests which are subsumed by these have been removed (atomics-sm.ll, atomics.ll, atomicrmw-expand.ll). - ~~Changes shouldExpandAtomicRMWInIR to take a constant argument: This is to allow some other TargetLowering constant-argument functions to call it. This change touches several backends. An alternative solution exists, but to me, this seems the "right" way.~~ Has been split out into https://github.com/llvm/llvm-project/pull/176073. Rebased. - NOTE: The initial load issued for atomicrmw emulation loops (and cmpxchg emulation loops) must be a strong load. Currently, AtomicExpandPass issues a weak load. Fixing this breaks several backends. I'm planning to follow up with a separate PR.	2026-01-27 15:41:21 -08:00
Rahul Joshi	26f962465e	[LLVM][CodeGen] Remove pass initialization calls from pass constructors (#173061 ) - Remove pass initialization calls from pass constructors. - For some passes, add the initialization to `initializeCodeGen` or `initializeGlobalISel`. - Remove redundant initializations from llc and X86 target for some passes.	2026-01-21 08:44:51 -08:00
Matt Arsenault	36801a5b80	AtomicExpand: Use LibcallLoweringInfo analysis (#176384 )	2026-01-18 07:12:41 +01:00
Usman Nadeem	1ea201d73b	[WoA] Remove extra barriers after ARM LSE instructions with MSVC (#169596 ) `c9821abfc0` added extra fences after sequentially consistent stores for compatibility with MSVC's seq_cst loads (ldr+dmb). These extra fences should not be needed for ARM LSE instructions that have both acquire+release semantics, which results in a two way barrier, and should be enough for sequential consistency. Fixes https://github.com/llvm/llvm-project/issues/162345 Change-Id: I9148c73d0dcf3bf1b18a0915f96cac71ac1800f2	2025-12-15 17:19:40 -08:00
Nikita Popov	b0bd8bdbd8	[AtomicExpand] Use getSigned() for negative value	2025-12-09 16:13:59 +01:00
Tim Gymnich	af0b6b18a8	[ProfCheck][NFC] fix argument order for call to setExplicitlyUnknownBranchWeightsIfProfiled (#166601 )	2025-11-05 19:16:03 +01:00
Jin Huang	efa7ab06eb	[profcheck] Add unknown branch weights to expanded cmpxchg loop. (#165841 ) The AtomicExpandPass is responsible for lowering high-level atomic operations (like `atomicrmw fadd`) that are unsupported by the target hardware into a cmpxchg retry loop. Given that we cannot empirically prove the precision branch weights, It uses the `setExplicitlyUnknownBranchWeightsIfProfiled` function to explicitly add "unknown" (50/50) branch weights to this branch. This PR includes fies for the following tests: ``` Transforms/AtomicExpand/AArch64/atomicrmw-fp.ll Transforms/AtomicExpand/AArch64/pcsections.ll Transforms/AtomicExpand/AMDGPU/expand-atomic-f32-agent.ll Transforms/AtomicExpand/AMDGPU/expand-atomic-f32-system.ll Transforms/AtomicExpand/AMDGPU/expand-atomic-f64-agent.ll Transforms/AtomicExpand/AMDGPU/expand-atomic-f64-system.ll Transforms/AtomicExpand/AMDGPU/expand-atomic-rmw-nand.ll Transforms/AtomicExpand/AMDGPU/expand-atomic-simplify-cfg-CAS-block.ll Transforms/AtomicExpand/AMDGPU/expand-atomic-v2bf16-agent.ll Transforms/AtomicExpand/AMDGPU/expand-atomic-v2bf16-system.ll Transforms/AtomicExpand/AMDGPU/expand-atomic-v2f16-agent.ll Transforms/AtomicExpand/AMDGPU/expand-atomic-v2f16-system.ll Transforms/AtomicExpand/AMDGPU/expand-atomicrmw-fp-vector.ll Transforms/AtomicExpand/ARM/atomicrmw-fp.ll Transforms/AtomicExpand/LoongArch/atomicrmw-fp.ll Transforms/AtomicExpand/Mips/atomicrmw-fp.ll Transforms/AtomicExpand/PowerPC/atomicrmw-fp.ll Transforms/AtomicExpand/RISCV/atomicrmw-fp.ll Transforms/AtomicExpand/SPARC/libcalls.ll Transforms/AtomicExpand/X86/expand-atomic-rmw-fp.ll Transforms/AtomicExpand/X86/expand-atomic-rmw-initial-load.ll Transforms/AtomicExpand/X86/expand-atomic-xchg-fp.ll ``` Co-authored-by: Jin Huang <jingold@google.com>	2025-11-05 09:33:09 -08:00
Mircea Trofin	ff108f7486	Fix failures introduced in #166032 (#166574 )	2025-11-05 08:02:55 -08:00
Jin Huang	fa5cd27ef0	[profcheck] Add unknown branch weights to expand LL/SR loop. (#166273 ) As a follow-up to PR#165841, this change addresses `prof_md` metadata loss in AtomicExpandPass when lowering `atomicrmw xchg` to a Load-Linked/Store-Exclusive (LL/SC) loop. This path is distinct from the LSE path addressed previously: PR #165841 (and its tests) used `-mtriple=aarch64-linux-gnu`, which targets a modern ARMv8.1+ architecture. This architecture supports Large System Extensions (LSE), allowing `atomicrmw` to be lowered directly to a more efficient hardware instruction. This PR (and its tests) uses `-mtriple=aarch64--` or `-mtriple=armv8-linux-gnueabihf`. This indicates an `ARMv8.0 or lower architecture that does not support LSE`. On these targets, the pass must fall back to synthesizing a manual LL/SC loop using the `ldaxr/stxr` instruction pair. Similar to previous issue, the new conditional branch was failin to inherit the `prof_md` metadata. Theis PR correctly fix the branch weights to the newly created branch within the LL/SC loop, ensuring profile information is preserved. Co-authored-by: Jin Huang <jingold@google.com>	2025-11-04 16:23:34 -08:00
Kazu Hirata	358513f662	[llvm] Replace LLVM_ATTRIBUTE_UNUSED with [[maybe_unused]] (NFC) (#163330 ) This patch replaces LLVM_ATTRIBUTE_UNUSED with [[maybe_unused]] where we do not need to move the position of [[maybe_unused]] within declarations. Notes: - [[maybe_unused]] is a standard feature of C++17. - The compiler is far more lenient about the placement of __attribute__((unused)) than that of [[maybe_unused]]. I'll follow up with another patch to finish up the rest.	2025-10-14 07:15:44 -07:00
zhijian lin	36cb33bbca	support branch hint for AtomicExpandImpl::expandAtomicCmpXchg (#152366 ) The patch add branch hint for AtomicExpandImpl::expandAtomicCmpXchg, For example: in PowerPC, it support branch hint as ``` loop: lwarx r6,0,r3 # load and reserve cmpw r4,r6 #1st 2 operands equal? bne- exit #skip if not bne- exit #skip if not stwcx. r5,0,r3 #store new value if still res’ved bne- loop #loop if lost reservation bne- loop #loop if lost reservation exit: mr r4,r6 #return value from storage ``` `-` hints not taken, `+` hints taken,	2025-09-02 09:33:28 -04:00
Pierre van Houtryve	8b9b0fdedf	[CodeGen][TLI] Allow targets to custom expand atomic load/stores (#154708 ) Loads didn't have the `Expand` option in `AtomicExpandPass`. Stores had `Expand` but it didn't defer to TLI and instead did an action directly. Add a `CustomExpand` option and make it always map to the TLI hook for all cases. The `Expand` option now refers to a generic expansion for all targets.	2025-08-28 09:58:10 +02:00
Nikita Popov	c23b4fbdbb	[IR] Remove size argument from lifetime intrinsics (#150248 ) Now that #149310 has restricted lifetime intrinsics to only work on allocas, we can also drop the explicit size argument. Instead, the size is implied by the alloca. This removes the ability to only mark a prefix of an alloca alive/dead. We never used that capability, so we should remove the need to handle that possibility everywhere (though many key places, including stack coloring, did not actually respect this).	2025-08-08 11:09:34 +02:00
Matt Arsenault	ed1ee9a9bf	AtomicExpand: Stop using report_fatal_error (#147300 ) Emit a context error and delete the instruction. This allows removing the AMDGPU hack where some atomic libcalls are falsely added. NVPTX also later copied the same hack, so remove it there too. For now just emit the generic error, which is not good. It's missing any useful context information (despite taking the instruction). It's also confusing in the failed atomicrmw case, since it's reporting failure at the intermediate failed cmpxchg instead of the original atomicrmw.	2025-07-09 15:28:10 +09:00
AZero13	7119a0f39b	[AtomicExpandPass] Match isIdempotentRMW with InstcombineRMW (#142277 ) Add umin, smin, umax, smax to isIdempotentRMW	2025-06-08 11:32:20 +01:00
Jonathan Thackray	6e49f73825	Reland [llvm] Add support for llvm IR atomicrmw fminimum/fmaximum instructions (#137701 ) This patch adds support for LLVM IR atomicrmw `fmaximum` and `fminimum` instructions. These mirror the `llvm.maximum.` and `llvm.minimum.` instructions, but are atomic and use IEEE754 2019 handling for NaNs, which is different to `fmax` and `fmin`. See: https://llvm.org/docs/LangRef.html#llvm-minimum-intrinsic for more details. Future changes will allow this LLVM IR to be lowered to specialised assembler instructions on suitable targets, such as AArch64.	2025-04-30 22:06:37 +01:00
Jonathan Thackray	7ee0097b48	Revert "[llvm] Add support for llvm IR atomicrmw fminimum/fmaximum instructions" (#137657 ) Reverts llvm/llvm-project#136759 due to bad interaction with c792b25e4	2025-04-28 16:53:36 +01:00
Jonathan Thackray	ba420d8122	[llvm] Add support for llvm IR atomicrmw fminimum/fmaximum instructions (#136759 ) This patch adds support for LLVM IR atomicrmw `fmaximum` and `fminimum` instructions. These mirror the `llvm.maximum.` and `llvm.minimum.` instructions, but are atomic and use IEEE754 2019 handling for NaNs, which is different to `fmax` and `fmin`. See: https://llvm.org/docs/LangRef.html#llvm-minimum-intrinsic for more details. Future changes will allow this LLVM IR to be lowered to specialised assembler instructions on suitable targets, such as AArch64.	2025-04-28 15:31:44 +01:00
Akshay Deodhar	9638d08af9	[NVPTX] Support for memory orderings for cmpxchg (#126159 ) So far, all cmpxchg instructions were lowered to atom.cas. This change adds support for memory orders in lowering. Specifically: - For cmpxchg which are emulated, memory ordering is enforced by adding fences around the emulation loops. - For cmpxchg which are lowered to PTX directly, where the memory order is supported in ptx, lower directly to the correct ptx instruction. - For seq_cst cmpxchg which are lowered to PTX directly, use a sequence (fence.sc; atom.cas.acquire) to provide the semantics that we want. Also adds tests for all possible combinations of (size, memory ordering, address space, SM/PTX versions) This also adds `atomicOperationOrderAfterFenceSplit` in TargetLowering, for specially handling seq_cst atomics.	2025-02-24 10:13:23 -08:00
Matt Arsenault	04d450fd8d	AtomicExpand: Preserve metadata when bitcasting fp atomicrmw xchg (#115240 )	2024-11-13 12:51:18 -08:00
Kazu Hirata	735ab61ac8	[CodeGen] Remove unused includes (NFC) (#115996 ) Identified with misc-include-cleaner.	2024-11-12 23:15:06 -08:00
Matt Arsenault	30dd1297fa	AMDGPU: Custom expand flat cmpxchg which may access private (#109410 ) 64-bit flat cmpxchg instructions do not work correctly for scratch addresses, and need to be expanded as non-atomic. Allow custom expansion of cmpxchg in AtomicExpand, as is already the case for atomicrmw.	2024-11-04 09:29:38 -08:00
Matt Arsenault	9cc298108a	AtomicExpand: Copy metadata from atomicrmw to cmpxchg (#109409 ) When expanding an atomicrmw with a cmpxchg, preserve any metadata attached to it. This will avoid unwanted double expansions in a future commit. The initial load should also probably receive the same metadata (which for some reason is not emitted as an atomic).	2024-10-31 11:54:07 -07:00
Matt Arsenault	5326614e2f	AtomicExpand: Really allow incremental legalization (#108613 ) Fix up 100d9b89947bb1d42af20010bb594fa4c02542fc. The iterator fixes ended up defeating the point, since newly inserted blocks were not visited. This never erases the current block, so we can simply not preincrement the block iterator. The AArch64 FP atomic tests now expand the cmpxchg in the second round of legalization.	2024-09-20 08:18:33 +04:00
anjenner	4af249fe6e	Add usub_cond and usub_sat operations to atomicrmw (#105568 ) These both perform conditional subtraction, returning the minuend and zero respectively, if the difference is negative.	2024-09-06 16:19:20 +01:00
Matt Arsenault	100d9b8994	Reapply "AtomicExpand: Allow incrementally legalizing atomicrmw" (#107307 ) This reverts commit 63da545ccdd41d9eb2392a8d0e848a65eb24f5fa. Use reverse iteration in the instruction loop to avoid sanitizer errors. This also has the side effect of avoiding the AArch64 codegen quality regressions. Closes #107309	2024-09-06 18:37:34 +04:00
Vitaly Buka	63da545ccd	Revert "Reland "AtomicExpand: Allow incrementally legalizing atomicrmw"" (#107307 ) Reverts llvm/llvm-project#106793 `Next == E` is not enough: https://lab.llvm.org/buildbot/#/builders/169/builds/2834 `Next` is deleted by `processAtomicInstr`	2024-09-04 13:43:41 -07:00
Vitaly Buka	06286832db	Reland "Revert "AtomicExpand: Allow incrementally legalizing atomicrmw"" (#106793 ) Reverts llvm/llvm-project#106792 The first commit of PR is pure revert, the rest is a possible fix.	2024-09-04 10:29:03 +04:00
Vitaly Buka	982d2445f2	Revert "AtomicExpand: Allow incrementally legalizing atomicrmw" (#106792 ) Reverts llvm/llvm-project#103371 There is `heap-use-after-free`, commented on 206b5aff44a95754f6dd7a5696efa024e983ac59 Maybe `if (Next == E \|\| BB != Next->getParent()) {` is enough, but not sure, what was the intent there,	2024-08-30 13:51:53 -07:00
Matt Arsenault	206b5aff44	AtomicExpand: Allow incrementally legalizing atomicrmw (#103371 ) If a lowering changed control flow, resume the legalization loop at the first newly inserted block. This will allow incrementally legalizing atomicrmw and cmpxchg. The AArch64 test might be a bugfix. Previously it would lower the vector FP case as a cmpxchg loop, but cmpxchgs get lowered but previously weren't. Maybe it shouldn't be reporting cmpxchg for the expand type in the first place though.	2024-08-30 19:11:45 +04:00
Matt Arsenault	109b50808f	AtomicExpand: Add assert that atomicrmw is an xchg It turns out it's trivial to hit this path with any rmw operation.	2024-08-14 10:55:52 +04:00
Matt Arsenault	2d7a2c1212	AtomicExpand: Refactor atomic instruction handling (#102914 ) Move the processing of an instruction into a helper function. Also avoid redundant checking for all types of atomic instructions. Including the assert, it was effectively performing the same check 3 times.	2024-08-13 19:51:53 +04:00
Sergei Barannikov	4e93b16f3f	[llvm] Make InstSimplifyFolder constructor explicit (NFC) (#101654 )	2024-08-02 16:45:50 +03:00
Joseph Huber	615b7eeaa9	Reapply "[LLVM][LTO] Factor out RTLib calls and allow them to be dropped (#98512 )" This reverts commit 740161a9b98c9920dedf1852b5f1c94d0a683af5. I moved the `ISD` dependencies into the CodeGen portion of the handling, it's a little awkward but it's the easiest solution I can think of for now.	2024-07-20 09:29:31 -05:00
NAKAMURA Takumi	740161a9b9	Revert "[LLVM][LTO] Factor out RTLib calls and allow them to be dropped (#98512 )" This reverts commit c05126bdfc3b02daa37d11056fa43db1a6cdef69. (llvmorg-19-init-17714-gc05126bdfc3b) See #99610	2024-07-20 12:36:57 +09:00
Joseph Huber	c05126bdfc	[LLVM][LTO] Factor out RTLib calls and allow them to be dropped (#98512 ) Summary: The LTO pass and LLD linker have logic in them that forces extraction and prevent internalization of needed runtime calls. However, these currently take all RTLibcalls into account, even if the target does not support them. The target opts-out of a libcall if it sets its name to nullptr. This patch pulls this logic out into a class in the header so that LTO / lld can use it to determine if a symbol actually needs to be kept. This is important for targets like AMDGPU that want to be able to use `lld` to perform the final link step, but does not want the overhead of uncalled functions. (This adds like a second to the link time trivially)	2024-07-16 06:22:09 -05:00
Nikita Popov	9df71d7673	[IR] Add getDataLayout() helpers to Function and GlobalValue (#96919 ) Similar to https://github.com/llvm/llvm-project/pull/96902, this adds `getDataLayout()` helpers to Function and GlobalValue, replacing the current `getParent()->getDataLayout()` pattern.	2024-06-28 08:36:49 +02:00
Nikita Popov	2d209d964a	[IR] Add getDataLayout() helpers to BasicBlock and Instruction (#96902 ) This is a helper to avoid writing `getModule()->getDataLayout()`. I regularly try to use this method only to remember it doesn't exist... `getModule()->getDataLayout()` is also a common (the most common?) reason why code has to include the Module.h header.	2024-06-27 16:38:15 +02:00
Stephen Tozer	d75f9dd1d2	Revert "[IR][NFC] Update IRBuilder to use InsertPosition (#96497 )" Reverts the above commit, as it updates a common header function and did not update all callsites: https://lab.llvm.org/buildbot/#/builders/29/builds/382 This reverts commit 6481dc57612671ebe77fe9c34214fba94e1b3b27.	2024-06-24 18:00:22 +01:00
Stephen Tozer	6481dc5761	[IR][NFC] Update IRBuilder to use InsertPosition (#96497 ) Uses the new InsertPosition class (added in #94226) to simplify some of the IRBuilder interface, and removes the need to pass a BasicBlock alongside a BasicBlock::iterator, using the fact that we can now get the parent basic block from the iterator even if it points to the sentinel. This patch removes the BasicBlock argument from each constructor or call to setInsertPoint. This has no functional effect, but later on as we look to remove the `Instruction *InsertBefore` argument from instruction-creation (discussed [here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)), this will simplify the process by allowing us to deprecate the InsertPosition constructor directly and catch all the cases where we use instructions rather than iterators.	2024-06-24 17:27:43 +01:00
Matt Arsenault	f3afdc4ad9	AtomicExpand: Fix creating invalid ptrmask for fat pointers (#94955 ) The ptrmask intrinsic requires the integer mask to be the index size, not the pointer size.	2024-06-12 10:45:42 +02:00
Matt Arsenault	d81170873c	AtomicExpand: Preserve metadata when expanding partword RMW (#89769 ) This will be important for AMDGPU in a future patch.	2024-05-23 10:04:47 +02:00
Matt Arsenault	7927bcdb8a	AMDGPU: Do not bitcast atomicrmw in IR (#90045 ) This is the first step to eliminating shouldCastAtomicRMWIInIR. This and the other atomic expand casting hooks should be removed. This adds duplicate legalization machinery and interfaces. This is already what codegen is supposed to do, and already does for the promotion case. In the case of atomicrmw xchg, there seems to be some benefit to having the bitcasts moved outside of the cmpxchg loop on targets with separate int and FP registers, which we should be able to deal with by directly checking for the legality of the underlying operation. The casting path was also losing metadata when it recreated the instruction.	2024-05-07 18:26:32 +02:00
Matt Arsenault	a45eb62877	AtomicExpand: Fix dropping a syncscope when bitcasting atomicrmw	2024-04-24 19:09:34 +02:00
Pierre van Houtryve	cf328ff96d	[IR] Memory Model Relaxation Annotations (#78569 ) Implements the core/target-agnostic components of Memory Model Relaxation Annotations. RFC: https://discourse.llvm.org/t/rfc-mmras-memory-model-relaxation-annotations/76361/5	2024-04-24 08:52:25 +02:00
Matt Arsenault	31af5e9001	AtomicExpand: Emit or with constant on RHS This will save later code from commuting it.	2024-04-23 15:00:31 +02:00
Matt Arsenault	4cb110a84f	[RFC] IR: Support atomicrmw FP ops with vector types (#86796 ) Allow using atomicrmw fadd, fsub, fmin, and fmax with vectors of floating-point type. AMDGPU supports atomic fadd for <2 x half> and <2 x bfloat> on some targets and address spaces. Note this only supports the proper floating-point operations; float vector typed xchg is still not supported. cmpxchg still only supports integers, so this inserts bitcasts for the loop expansion. I have support for fp vector typed xchg, and vector of int/ptr separately implemented but I don't have an immediate need for those beyond feature consistency.	2024-04-06 15:27:45 -04:00
Kevin P. Neal	fe893c93b7	[FPEnv][AtomicExpand] Correct strictfp attribute handling in AtomicExpandPass (#87082 ) The AtomicExpand pass was lowering function calls with the strictfp attribute to sequences that included function calls incorrectly lacking the attribute. This patch corrects that. The pass now also emits the correct constrained fp call instead of normal FP instructions when in a function with the strictfp attribute. Test changes verified with D146845.	2024-03-29 14:54:51 -04:00

1 2 3 4

192 Commits