184 Commits

Author SHA1 Message Date
Jin Huang
efa7ab06eb
[profcheck] Add unknown branch weights to expanded cmpxchg loop. (#165841)
The AtomicExpandPass is responsible for lowering high-level atomic
operations (like `atomicrmw fadd`) that are unsupported by the target
hardware into a cmpxchg retry loop.

Given that we cannot empirically prove the precision branch weights, It
uses the `setExplicitlyUnknownBranchWeightsIfProfiled` function to
explicitly add "unknown" (50/50) branch weights to this branch.

This PR includes fies for the following tests:
```
Transforms/AtomicExpand/AArch64/atomicrmw-fp.ll
Transforms/AtomicExpand/AArch64/pcsections.ll
Transforms/AtomicExpand/AMDGPU/expand-atomic-f32-agent.ll
Transforms/AtomicExpand/AMDGPU/expand-atomic-f32-system.ll
Transforms/AtomicExpand/AMDGPU/expand-atomic-f64-agent.ll
Transforms/AtomicExpand/AMDGPU/expand-atomic-f64-system.ll
Transforms/AtomicExpand/AMDGPU/expand-atomic-rmw-nand.ll
Transforms/AtomicExpand/AMDGPU/expand-atomic-simplify-cfg-CAS-block.ll
Transforms/AtomicExpand/AMDGPU/expand-atomic-v2bf16-agent.ll
Transforms/AtomicExpand/AMDGPU/expand-atomic-v2bf16-system.ll
Transforms/AtomicExpand/AMDGPU/expand-atomic-v2f16-agent.ll
Transforms/AtomicExpand/AMDGPU/expand-atomic-v2f16-system.ll
Transforms/AtomicExpand/AMDGPU/expand-atomicrmw-fp-vector.ll
Transforms/AtomicExpand/ARM/atomicrmw-fp.ll
Transforms/AtomicExpand/LoongArch/atomicrmw-fp.ll
Transforms/AtomicExpand/Mips/atomicrmw-fp.ll
Transforms/AtomicExpand/PowerPC/atomicrmw-fp.ll
Transforms/AtomicExpand/RISCV/atomicrmw-fp.ll
Transforms/AtomicExpand/SPARC/libcalls.ll
Transforms/AtomicExpand/X86/expand-atomic-rmw-fp.ll
Transforms/AtomicExpand/X86/expand-atomic-rmw-initial-load.ll
Transforms/AtomicExpand/X86/expand-atomic-xchg-fp.ll
```

Co-authored-by: Jin Huang <jingold@google.com>
2025-11-05 09:33:09 -08:00
Mircea Trofin
ff108f7486
Fix failures introduced in #166032 (#166574) 2025-11-05 08:02:55 -08:00
Jin Huang
fa5cd27ef0
[profcheck] Add unknown branch weights to expand LL/SR loop. (#166273)
As a follow-up to PR#165841, this change addresses `prof_md` metadata
loss in AtomicExpandPass when lowering `atomicrmw xchg` to a
Load-Linked/Store-Exclusive (LL/SC) loop.

This path is distinct from the LSE path addressed previously:

PR #165841 (and its tests) used `-mtriple=aarch64-linux-gnu`, which
targets a modern **ARMv8.1+** architecture. This architecture supports
**Large System Extensions (LSE)**, allowing `atomicrmw` to be lowered
directly to a more efficient hardware instruction.

This PR (and its tests) uses `-mtriple=aarch64--` or
`-mtriple=armv8-linux-gnueabihf`. This indicates an `ARMv8.0 or lower
architecture that does not support LSE`. On these targets, the pass must
fall back to synthesizing a manual LL/SC loop using the `ldaxr/stxr`
instruction pair.

Similar to previous issue, the new conditional branch was failin to
inherit the `prof_md` metadata. Theis PR correctly fix the branch
weights to the newly created branch within the LL/SC loop, ensuring
profile information is preserved.

Co-authored-by: Jin Huang <jingold@google.com>
2025-11-04 16:23:34 -08:00
Kazu Hirata
358513f662
[llvm] Replace LLVM_ATTRIBUTE_UNUSED with [[maybe_unused]] (NFC) (#163330)
This patch replaces LLVM_ATTRIBUTE_UNUSED with [[maybe_unused]] where
we do not need to move the position of [[maybe_unused]] within
declarations.

Notes:

- [[maybe_unused]] is a standard feature of C++17.

- The compiler is far more lenient about the placement of
  __attribute__((unused)) than that of [[maybe_unused]].

I'll follow up with another patch to finish up the rest.
2025-10-14 07:15:44 -07:00
zhijian lin
36cb33bbca
support branch hint for AtomicExpandImpl::expandAtomicCmpXchg (#152366)
The patch add branch hint for AtomicExpandImpl::expandAtomicCmpXchg, For
example: in PowerPC, it support branch hint as

```
loop:
    lwarx r6,0,r3   #  load and reserve
    cmpw r4,r6      #1st 2 operands equal? bne- exit  #skip if not
    bne- exit       #skip if not
    stwcx. r5,0,r3  #store new value if still res’ved bne- loop #loop if lost reservation
    bne- loop #loop if lost reservation
exit:
    mr  r4,r6       #return value from storage
```

`-`  hints not taken,
`+` hints taken,
2025-09-02 09:33:28 -04:00
Pierre van Houtryve
8b9b0fdedf
[CodeGen][TLI] Allow targets to custom expand atomic load/stores (#154708)
Loads didn't have the `Expand` option in `AtomicExpandPass`. Stores had
`Expand` but it didn't defer to TLI and instead did an action directly.
Add a `CustomExpand` option and make it always map to the TLI hook for
all cases. The `Expand` option now refers to a generic expansion for all
targets.
2025-08-28 09:58:10 +02:00
Nikita Popov
c23b4fbdbb
[IR] Remove size argument from lifetime intrinsics (#150248)
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.

This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
2025-08-08 11:09:34 +02:00
Matt Arsenault
ed1ee9a9bf
AtomicExpand: Stop using report_fatal_error (#147300)
Emit a context error and delete the instruction. This
allows removing the AMDGPU hack where some atomic libcalls
are falsely added. NVPTX also later copied the same hack,
so remove it there too.

For now just emit the generic error, which is not good. It's
missing any useful context information (despite taking the instruction).
It's also confusing in the failed atomicrmw case, since it's reporting
failure at the intermediate failed cmpxchg instead of the original
atomicrmw.
2025-07-09 15:28:10 +09:00
AZero13
7119a0f39b
[AtomicExpandPass] Match isIdempotentRMW with InstcombineRMW (#142277)
Add umin, smin, umax, smax to isIdempotentRMW
2025-06-08 11:32:20 +01:00
Jonathan Thackray
6e49f73825
Reland [llvm] Add support for llvm IR atomicrmw fminimum/fmaximum instructions (#137701)
This patch adds support for LLVM IR atomicrmw `fmaximum` and `fminimum`
instructions.

These mirror the `llvm.maximum.*` and `llvm.minimum.*` instructions, but
are atomic and use IEEE754 2019 handling for NaNs, which is different to
`fmax` and `fmin`. See:
     https://llvm.org/docs/LangRef.html#llvm-minimum-intrinsic
for more details.

Future changes will allow this LLVM IR to be lowered to specialised
assembler instructions on suitable targets, such as AArch64.
2025-04-30 22:06:37 +01:00
Jonathan Thackray
7ee0097b48
Revert "[llvm] Add support for llvm IR atomicrmw fminimum/fmaximum instructions" (#137657)
Reverts llvm/llvm-project#136759 due to bad interaction with c792b25e4
2025-04-28 16:53:36 +01:00
Jonathan Thackray
ba420d8122
[llvm] Add support for llvm IR atomicrmw fminimum/fmaximum instructions (#136759)
This patch adds support for LLVM IR atomicrmw `fmaximum` and `fminimum`
instructions.

These mirror the `llvm.maximum.*` and `llvm.minimum.*` instructions, but
are atomic and use IEEE754 2019 handling for NaNs, which is different to
`fmax` and `fmin`. See:
     https://llvm.org/docs/LangRef.html#llvm-minimum-intrinsic
for more details.

Future changes will allow this LLVM IR to be lowered to specialised
assembler instructions on suitable targets, such as AArch64.
2025-04-28 15:31:44 +01:00
Akshay Deodhar
9638d08af9
[NVPTX] Support for memory orderings for cmpxchg (#126159)
So far, all cmpxchg instructions were lowered to atom.cas. This change
adds support for memory orders in lowering. Specifically:
- For cmpxchg which are emulated, memory ordering is enforced by adding
fences around the emulation loops.
- For cmpxchg which are lowered to PTX directly, where the memory order
is supported in ptx, lower directly to the correct ptx instruction.
- For seq_cst cmpxchg which are lowered to PTX directly, use a sequence
(fence.sc; atom.cas.acquire) to provide the semantics that we want.

Also adds tests for all possible combinations of (size, memory ordering,
address space, SM/PTX versions)

This also adds `atomicOperationOrderAfterFenceSplit` in TargetLowering,
for specially handling seq_cst atomics.
2025-02-24 10:13:23 -08:00
Matt Arsenault
04d450fd8d
AtomicExpand: Preserve metadata when bitcasting fp atomicrmw xchg (#115240) 2024-11-13 12:51:18 -08:00
Kazu Hirata
735ab61ac8
[CodeGen] Remove unused includes (NFC) (#115996)
Identified with misc-include-cleaner.
2024-11-12 23:15:06 -08:00
Matt Arsenault
30dd1297fa
AMDGPU: Custom expand flat cmpxchg which may access private (#109410)
64-bit flat cmpxchg instructions do not work correctly for scratch
addresses, and need to be expanded as non-atomic.

Allow custom expansion of cmpxchg in AtomicExpand, as is
already the case for atomicrmw.
2024-11-04 09:29:38 -08:00
Matt Arsenault
9cc298108a
AtomicExpand: Copy metadata from atomicrmw to cmpxchg (#109409)
When expanding an atomicrmw with a cmpxchg, preserve any metadata
attached to it. This will avoid unwanted double expansions
in a future commit.

The initial load should also probably receive the same metadata
(which for some reason is not emitted as an atomic).
2024-10-31 11:54:07 -07:00
Matt Arsenault
5326614e2f
AtomicExpand: Really allow incremental legalization (#108613)
Fix up 100d9b89947bb1d42af20010bb594fa4c02542fc. The iterator
fixes ended up defeating the point, since newly inserted blocks
were not visited. This never erases the current block, so we can
simply not preincrement the block iterator.

The AArch64 FP atomic tests now expand the cmpxchg in the second
round of legalization.
2024-09-20 08:18:33 +04:00
anjenner
4af249fe6e
Add usub_cond and usub_sat operations to atomicrmw (#105568)
These both perform conditional subtraction, returning the minuend and
zero respectively, if the difference is negative.
2024-09-06 16:19:20 +01:00
Matt Arsenault
100d9b8994 Reapply "AtomicExpand: Allow incrementally legalizing atomicrmw" (#107307)
This reverts commit 63da545ccdd41d9eb2392a8d0e848a65eb24f5fa.

Use reverse iteration in the instruction loop to avoid sanitizer
errors. This also has the side effect of avoiding the AArch64
codegen quality regressions.

Closes #107309
2024-09-06 18:37:34 +04:00
Vitaly Buka
63da545ccd
Revert "Reland "AtomicExpand: Allow incrementally legalizing atomicrmw"" (#107307)
Reverts llvm/llvm-project#106793

`Next == E` is not enough:
https://lab.llvm.org/buildbot/#/builders/169/builds/2834

`Next` is deleted by `processAtomicInstr`
2024-09-04 13:43:41 -07:00
Vitaly Buka
06286832db
Reland "Revert "AtomicExpand: Allow incrementally legalizing atomicrmw"" (#106793)
Reverts llvm/llvm-project#106792

The first commit of PR is pure revert, the rest is a possible fix.
2024-09-04 10:29:03 +04:00
Vitaly Buka
982d2445f2
Revert "AtomicExpand: Allow incrementally legalizing atomicrmw" (#106792)
Reverts llvm/llvm-project#103371

There is `heap-use-after-free`, commented on
206b5aff44a95754f6dd7a5696efa024e983ac59

Maybe `if (Next == E || BB != Next->getParent()) {` is enough,
but not sure, what was the intent there,
2024-08-30 13:51:53 -07:00
Matt Arsenault
206b5aff44
AtomicExpand: Allow incrementally legalizing atomicrmw (#103371)
If a lowering changed control flow, resume the legalization
loop at the first newly inserted block.

This will allow incrementally legalizing atomicrmw and cmpxchg.

The AArch64 test might be a bugfix. Previously it would lower
the vector FP case as a cmpxchg loop, but cmpxchgs get lowered
but previously weren't. Maybe it shouldn't be reporting cmpxchg
for the expand type in the first place though.
2024-08-30 19:11:45 +04:00
Matt Arsenault
109b50808f AtomicExpand: Add assert that atomicrmw is an xchg
It turns out it's trivial to hit this path with any rmw
operation.
2024-08-14 10:55:52 +04:00
Matt Arsenault
2d7a2c1212
AtomicExpand: Refactor atomic instruction handling (#102914)
Move the processing of an instruction into a helper function. Also
avoid redundant checking for all types of atomic instructions.
Including the assert, it was effectively performing the same check
3 times.
2024-08-13 19:51:53 +04:00
Sergei Barannikov
4e93b16f3f
[llvm] Make InstSimplifyFolder constructor explicit (NFC) (#101654) 2024-08-02 16:45:50 +03:00
Joseph Huber
615b7eeaa9 Reapply "[LLVM][LTO] Factor out RTLib calls and allow them to be dropped (#98512)"
This reverts commit 740161a9b98c9920dedf1852b5f1c94d0a683af5.

I moved the `ISD` dependencies into the CodeGen portion of the handling,
it's a little awkward but it's the easiest solution I can think of for
now.
2024-07-20 09:29:31 -05:00
NAKAMURA Takumi
740161a9b9 Revert "[LLVM][LTO] Factor out RTLib calls and allow them to be dropped (#98512)"
This reverts commit c05126bdfc3b02daa37d11056fa43db1a6cdef69.
(llvmorg-19-init-17714-gc05126bdfc3b)
See #99610
2024-07-20 12:36:57 +09:00
Joseph Huber
c05126bdfc
[LLVM][LTO] Factor out RTLib calls and allow them to be dropped (#98512)
Summary:
The LTO pass and LLD linker have logic in them that forces extraction
and prevent internalization of needed runtime calls. However, these
currently take all RTLibcalls into account, even if the target does not
support them. The target opts-out of a libcall if it sets its name to
nullptr. This patch pulls this logic out into a class in the header so
that LTO / lld can use it to determine if a symbol actually needs to be
kept.

This is important for targets like AMDGPU that want to be able to use
`lld` to perform the final link step, but does not want the overhead of
uncalled functions. (This adds like a second to the link time trivially)
2024-07-16 06:22:09 -05:00
Nikita Popov
9df71d7673
[IR] Add getDataLayout() helpers to Function and GlobalValue (#96919)
Similar to https://github.com/llvm/llvm-project/pull/96902, this adds
`getDataLayout()` helpers to Function and GlobalValue, replacing the
current `getParent()->getDataLayout()` pattern.
2024-06-28 08:36:49 +02:00
Nikita Popov
2d209d964a
[IR] Add getDataLayout() helpers to BasicBlock and Instruction (#96902)
This is a helper to avoid writing `getModule()->getDataLayout()`. I
regularly try to use this method only to remember it doesn't exist...

`getModule()->getDataLayout()` is also a common (the most common?)
reason why code has to include the Module.h header.
2024-06-27 16:38:15 +02:00
Stephen Tozer
d75f9dd1d2 Revert "[IR][NFC] Update IRBuilder to use InsertPosition (#96497)"
Reverts the above commit, as it updates a common header function and
did not update all callsites:

  https://lab.llvm.org/buildbot/#/builders/29/builds/382

This reverts commit 6481dc57612671ebe77fe9c34214fba94e1b3b27.
2024-06-24 18:00:22 +01:00
Stephen Tozer
6481dc5761
[IR][NFC] Update IRBuilder to use InsertPosition (#96497)
Uses the new InsertPosition class (added in #94226) to simplify some of
the IRBuilder interface, and removes the need to pass a BasicBlock
alongside a BasicBlock::iterator, using the fact that we can now get the
parent basic block from the iterator even if it points to the sentinel.
This patch removes the BasicBlock argument from each constructor or call
to setInsertPoint.

This has no functional effect, but later on as we look to remove the
`Instruction *InsertBefore` argument from instruction-creation
(discussed
[here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)),
this will simplify the process by allowing us to deprecate the
InsertPosition constructor directly and catch all the cases where we use
instructions rather than iterators.
2024-06-24 17:27:43 +01:00
Matt Arsenault
f3afdc4ad9
AtomicExpand: Fix creating invalid ptrmask for fat pointers (#94955)
The ptrmask intrinsic requires the integer mask to be the index size,
not the pointer size.
2024-06-12 10:45:42 +02:00
Matt Arsenault
d81170873c
AtomicExpand: Preserve metadata when expanding partword RMW (#89769)
This will be important for AMDGPU in a future patch.
2024-05-23 10:04:47 +02:00
Matt Arsenault
7927bcdb8a
AMDGPU: Do not bitcast atomicrmw in IR (#90045)
This is the first step to eliminating shouldCastAtomicRMWIInIR. This and
the other atomic expand casting hooks should be removed. This adds
duplicate legalization machinery and interfaces. This is already what
codegen is supposed to do, and already does for the promotion case.

In the case of atomicrmw xchg, there seems to be some benefit to having
the bitcasts moved outside of the cmpxchg loop on targets with separate
int and FP registers, which we should be able to deal with by directly
checking for the legality of the underlying operation.

The casting path was also losing metadata when it recreated the
instruction.
2024-05-07 18:26:32 +02:00
Matt Arsenault
a45eb62877 AtomicExpand: Fix dropping a syncscope when bitcasting atomicrmw 2024-04-24 19:09:34 +02:00
Pierre van Houtryve
cf328ff96d
[IR] Memory Model Relaxation Annotations (#78569)
Implements the core/target-agnostic components of Memory Model
Relaxation Annotations.

RFC:
https://discourse.llvm.org/t/rfc-mmras-memory-model-relaxation-annotations/76361/5
2024-04-24 08:52:25 +02:00
Matt Arsenault
31af5e9001 AtomicExpand: Emit or with constant on RHS
This will save later code from commuting it.
2024-04-23 15:00:31 +02:00
Matt Arsenault
4cb110a84f
[RFC] IR: Support atomicrmw FP ops with vector types (#86796)
Allow using atomicrmw fadd, fsub, fmin, and fmax with vectors of
floating-point type. AMDGPU supports atomic fadd for <2 x half> and <2 x
bfloat> on some targets and address spaces.

Note this only supports the proper floating-point operations; float
vector typed xchg is still not supported. cmpxchg still only supports
integers, so this inserts bitcasts for the loop expansion.

I have support for fp vector typed xchg, and vector of int/ptr
separately implemented but I don't have an immediate need for those
beyond feature consistency.
2024-04-06 15:27:45 -04:00
Kevin P. Neal
fe893c93b7
[FPEnv][AtomicExpand] Correct strictfp attribute handling in AtomicExpandPass (#87082)
The AtomicExpand pass was lowering function calls with the strictfp
attribute to sequences that included function calls incorrectly lacking
the attribute. This patch corrects that.

The pass now also emits the correct constrained fp call instead of
normal FP instructions when in a function with the strictfp attribute.

Test changes verified with D146845.
2024-03-29 14:54:51 -04:00
Rishabh Bali
fe42e72db2
[CodeGen] Port AtomicExpand to new Pass Manager (#71220)
Port the `atomicexpand` pass to the new Pass Manager. 
Fixes #64559
2024-02-25 18:42:22 +05:30
Craig Topper
79fec2f8ba
[AtomicExpand][RISCV] Call shouldExpandAtomicRMWInIR before widenPartwordAtomicRMW (#80947)
This gives the target a chance to keep an atomicrmw op that is smaller
than the minimum cmpxchg size. This is needed to support the Zabha
extension for RISC-V which provides i8/i16 atomicrmw operations, but
does not provide an i8/i16 cmpxchg or LR/SC instructions.

This moves the widening until after the target requests
LLSC/CmpXChg/MaskedIntrinsic expansion. Once we widen, we call
shouldExpandAtomicRMWInIR again to give the target another chance to
make a decision about the widened operation.

I considered making the targets return AtomicExpansionKind::Expand or a
new expansion kind for And/Or/Xor, but that required the targets to
special case And/Or/Xor which they weren't currently doing.
2024-02-07 08:24:50 -08:00
Bjorn Pettersson
e53b28c833 [llvm] Drop some bitcasts and references related to typed pointers
Differential Revision: https://reviews.llvm.org/D157551
2023-08-10 15:07:07 +02:00
Matt Arsenault
35be9e2903 AtomicExpand: Preserve syncscope when expanding partword atomics 2023-08-08 14:38:06 -04:00
Bjorn Pettersson
4ce7c4a92a [llvm] Drop some typed pointer handling/bitcasts
Differential Revision: https://reviews.llvm.org/D157016
2023-08-03 22:54:33 +02:00
Matt Arsenault
3701ebe76b AtomicExpand: Fix expanding atomics into unconstrained FP in strictfp functions
Ideally the normal fadd/fmin/fmax this was creating would fail the verifier.
It's probably also necessary to force off FP exception handlers in the cmpxchg
loop but we don't have a generic way to do that now.

Note strictfp builder is broken in the minnum/maxnum case

https://reviews.llvm.org/D154993
2023-07-11 18:51:15 -04:00
Matt Arsenault
778cf5431c IR: Add atomicrmw uinc_wrap and udec_wrap
These are essentially add/sub 1 with a clamping value.

AMDGPU has instructions for these. CUDA/HIP expose these as
atomicInc/atomicDec. Currently we use target intrinsics for these,
but those do no carry the ordering and syncscope. Add these to
atomicrmw so we can carry these and benefit from the regular
legalization processes.
2023-01-24 17:55:11 -04:00
Nadeem, Usman
c9821abfc0 [WoA] Use fences for sequentially consistent stores/writes
LLVM currently uses LDAR/STLR and variants for acquire/release
as well as seq_cst operations. This is fine as long as all code uses
this convention.

Normally LDAR/STLR act as one way barriers but when used in
combination they provide a sequentially consistent model. i.e.
when an LDAR appears after an STLR in program order the STLR
acts as a two way fence and the store will be observed before
the load.

The problem is that normal loads (unlike ldar), when they appear
after the STLR can be observed before STLR (if my understanding
is correct). Possibly providing weaker than expected guarantees if
they are used for ordered atomic operations.

Unfortunately in Microsoft Visual Studio STL seq_cst ld/st are
implemented using normal load/stores and explicit fences:
dmb ish + str + dmb ish
ldr + dmb ish

This patch uses fences for MSVC target whenever we write to the
memory in a sequentially consistent way so that we don't rely on
the assumptions that just using LDAR/STLR will give us sequentially
consistent ordering.

Differential Revision: https://reviews.llvm.org/D141748

Change-Id: I48f3500ff8ec89677c9f089ce58181bd139bc68a
2023-01-23 16:09:11 -08:00