Ideally the normal fadd/fmin/fmax this was creating would fail the verifier.
It's probably also necessary to force off FP exception handlers in the cmpxchg
loop but we don't have a generic way to do that now.
Note strictfp builder is broken in the minnum/maxnum case
https://reviews.llvm.org/D154993
These are essentially add/sub 1 with a clamping value.
AMDGPU has instructions for these. CUDA/HIP expose these as
atomicInc/atomicDec. Currently we use target intrinsics for these,
but those do no carry the ordering and syncscope. Add these to
atomicrmw so we can carry these and benefit from the regular
legalization processes.
This patch adds the support for `fmax` and `fmin` operations in `atomicrmw`
instruction. For now (at least in this patch), the instruction will be expanded
to CAS loop. There are already a couple of targets supporting the feature. I'll
create another patch(es) to enable them accordingly.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D127041