The conversion code always ended up just getting the type of Src from
the Src argument itself, with no virtual users of this, so there is no
point in also providing this API hook. Fix the documentation as well,
since it seems DestAddr must have been similarly removed at some point
in the past from the API but was still documented.
Also fixes CIR to actually return the casted value!
Summary:
The `__scoped_atomic` builtins are supposed to match the standard
GNU-flavored `__atomic` builtins. We added a scoped builtin without a
corresponding standard one before the fork so this should be added in
the release candidate. These were originally added in
https://github.com/llvm/llvm-project/pull/168666
Also, the name `uinc_wrap` does not follow the naming convention. The
GNU atomics use `fetch_xyz` to indicate that the builtin returns the
previous location's value as part of the RMW operation, which these do.
This PR renames it and its uses.
This PR extends __scoped_atomic builtins with inc and dec functions.
They map to LLVM IR `atomicrmw uinc_wrap` and `atomicrmw udec_wrap`.
These enable implementation of OpenCL-style atomic_inc / atomic_dec with
wrap semantics on targets supporting scoped atomics (e.g. GPUs).
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
d076608d58d1ec55016eb747a995511e3a3f72aa moved some deps around to avoid
cycles and left clang/Frontend/FrontendDiagnostic.h as a shim that
simply includes clang/Basic/DiagnosticFrontend.h. This PR inlines it so
that nothing in tree still includes clang/Frontend/FrontendDiagnostic.h.
Doing this will help prevent future layering issues. See #162865.
Frontend already depends on Basic, so no new deps need to be added
anywhere except for places that do strict dep checking.
`UnqualPtrTy` didn't always match `llvm::PointerType::getUnqual`:
sometimes it returned a pointer that is not in address space 0 (notably
for SPIRV).
Since `UnqualPtrTy` was used as the "generic" or "default" pointer type,
this patch renames it to `DefaultPtrTy` to avoid confusion with LLVM's
`PointerType::getUnqual`.
fixes#30023
The issue is that for compare exchange builtin, if the type's size is
not power of 2, it creates a temporary of size power of 2, then emit the
compare exchange operation. And later, the results of the compare
exchange operation has two components: 1. a boolean whether or not the
exchange happens. 2. the old value
we are supposed to write the old value into user's "expected" value.
However, in case the type is not power of 2, what we actually wrote to
is the temporary that was created.
The fix is to pass the "expected" address all the way down so it can
wrote to the correct address
Previously when using min_fetch/max_fetch atomics with floating point
types, LLVM would emit a crash. This patch updates the
EmitPostAtomicMinMax function in CGAtomic.cpp to take floating point
types. Included is a clang CodeGen test atomic-ops-float-check-minmax.c
and Sema test atomic-ops-fp-minmax.c.
Update the `operator%` overload that accepts `CharUnits` to return
`CharUnits` to match the other `operator%`. This is more logical than
returning an `int64` and cleans up users that want to continue to do
math with the result.
Many users of this were explicitly comparing against 0. I considered
updating these to compare against `CharUnits::Zero` or even introducing
an `explicit operator bool()`, but they all feel clearer if we update
them to use the existing `isMultipleOf()` function instead.
Clang CodeGen for `__atomic_test_and_set` would emit a `store`
instruction that stores an `i1` value:
```cpp
bool f(void *ptr) {
return __atomic_test_and_set(ptr, __ATOMIC_RELAXED);
}
```
```llvm
%1 = atomicrmw xchg ptr %0, i8 1 monotonic, align 1
%tobool = icmp ne i8 %1, 0
store i1 %tobool, ptr %atomic-temp, align 1
```
which could lead to suboptimal binary code, for example on x86_64:
```asm
f:
mov al, 1
xchg byte ptr [rdi], al
test al, al
setne al
setne byte ptr [rsp - 1]
ret
```
The last `setne` instruction is obviously redundant. This patch fixes
this issue by first zero-extending `%tobool` to an `i8` before the
store. This effectively eliminates the last `setne` instruction in the
binary code sequence. The `test` and `setne` on `al` is kept still,
though.
-----
I'm quite conservative about the codegen in this patch. Vanilla gcc
actually emits simpler code for `__atomic_test_and_set`:
```cpp
bool f(void *ptr) {
return __atomic_test_and_set(ptr, __ATOMIC_RELAXED);
}
```
```asm
f:
mov eax, 1
xchg al, BYTE PTR [rdi]
ret
```
It seems like gcc assumes `ptr` would always point to a valid `bool`
value as required by the ABI. I'm not sure if we should also make this
assumption.
Related to #121943 .
It isn't used and is redundant with the result pointer type argument.
A more reasonable API would only have LangAS parameters, or IR parameters,
not both. Not all values have a meaningful value for this. I'm also
not sure why we have this at all, it's not overridden by any targets and
further simplification is possible.
Re-write the sema and codegen for the atomic_test_and_set and
atomic_clear builtin functions to go via AtomicExpr, like the other
atomic builtins do. This simplifies the code, because AtomicExpr already
handles things like generating code for to dynamically select the memory
ordering, which was duplicated for these builtins. This also fixes a few
crash bugs, one when passing an integer to the pointer argument, and one
when using an array.
This also adds diagnostics for the memory orderings which are not valid
for atomic_clear according to
https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html, which
were missing before.
Fixes https://github.com/llvm/llvm-project/issues/111293.
This is a re-land of #120449, modified to allow any non-const pointer
type for the first argument.
Re-write the sema and codegen for the atomic_test_and_set and
atomic_clear builtin functions to go via AtomicExpr, like the other
atomic builtins do. This simplifies the code, because AtomicExpr already
handles things like generating code for to dynamically select the memory
ordering, which was duplicated for these builtins. This also fixes a few
crash bugs, one when passing an integer to the pointer argument, and one
when using an array.
This also adds diagnostics for the memory orderings which are not valid
for atomic_clear according to
https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html, which
were missing before.
Fixes#111293.
This change adds support for correctly lowering the `__scoped` Clang
builtins, and corresponding scoped LLVM instructions. These were
previously unconditionally lowered to Device scope, which is possibly incorrect.
Furthermore, the default / implicit scope is changed from Device (an
OpenCL assumption) to AllSvmDevices (aka System), since the SPIR-V BE is not
OpenCL specific / can ingest IR coming from other language front-ends. OpenCL
defaulting to Device scope is now reflected in the front-end handling of atomic
ops, which seems preferable.
Use this to replace the emission of the amdgpu-unsafe-fp-atomics
attribute in favor of per-instruction metadata. In the future
new fine grained controls should be introduced that also cover
the integer cases.
Add a wrapper around CreateAtomicRMW that appends the metadata,
and update a few use contexts to use it.
This is in effect a revert of f139ae3d93797, as we have since gained a
more sophisticated way of doing extra IRGen with the addition of
RawAddress in #86923.
To authenticate pointers, CodeGen needs access to the key and
discriminators that were used to sign the pointer. That information is
sometimes known from the context, but not always, which is why `Address`
needs to hold that information.
This patch adds methods and data members to `Address`, which will be
needed in subsequent patches to authenticate signed pointers, and uses
the newly added methods throughout CodeGen. Although this patch isn't
strictly NFC as it causes CodeGen to use different code paths in some
cases (e.g., `mergeAddressesInConditionalExpr`), it doesn't cause any
changes in functionality as it doesn't add any information needed for
authentication.
In addition to the changes mentioned above, this patch introduces class
`RawAddress`, which contains a pointer that we know is unsigned, and
adds several new functions for creating `Address` and `LValue` objects.
This reapplies d9a685a9dd589486e882b722e513ee7b8c84870c, which was
reverted because it broke ubsan bots. There seems to be a bug in
coroutine code-gen, which is causing EmitTypeCheck to use the wrong
alignment. For now, pass alignment zero to EmitTypeCheck so that it can
compute the correct alignment based on the passed type (see function
EmitCXXMemberOrOperatorMemberCallExpr).
To authenticate pointers, CodeGen needs access to the key and
discriminators that were used to sign the pointer. That information is
sometimes known from the context, but not always, which is why `Address`
needs to hold that information.
This patch adds methods and data members to `Address`, which will be
needed in subsequent patches to authenticate signed pointers, and uses
the newly added methods throughout CodeGen. Although this patch isn't
strictly NFC as it causes CodeGen to use different code paths in some
cases (e.g., `mergeAddressesInConditionalExpr`), it doesn't cause any
changes in functionality as it doesn't add any information needed for
authentication.
In addition to the changes mentioned above, this patch introduces class
`RawAddress`, which contains a pointer that we know is unsigned, and
adds several new functions for creating `Address` and `LValue` objects.
This reapplies 8bd1f9116aab879183f34707e6d21c7051d083b6. The commit
broke msan bots because LValue::IsKnownNonNull was uninitialized.
To authenticate pointers, CodeGen needs access to the key and
discriminators that were used to sign the pointer. That information is
sometimes known from the context, but not always, which is why `Address`
needs to hold that information.
This patch adds methods and data members to `Address`, which will be
needed in subsequent patches to authenticate signed pointers, and uses
the newly added methods throughout CodeGen. Although this patch isn't
strictly NFC as it causes CodeGen to use different code paths in some
cases (e.g., `mergeAddressesInConditionalExpr`), it doesn't cause any
changes in functionality as it doesn't add any information needed for
authentication.
In addition to the changes mentioned above, this patch introduces class
`RawAddress`, which contains a pointer that we know is unsigned, and
adds several new functions for creating `Address` and `LValue` objects.
The casting of FP atomic loads and stores were always done by the
front-end, even though the AtomicExpandPass will do it if the target
requests it (which is the default).
This patch removes this casting in the front-end entirely.
In the beginning, Clang only emitted atomic IR for operations it knew
the
underlying microarch had instructions for, meaning it required
significant
knowledge of the target. Later, the backend acquired the ability to
lower
IR to libcalls. To avoid duplicating logic and improve logic locality,
we'd like to move as much as possible to the backend.
There are many ways to describe this change. For example, this change
reduces the variables Clang uses to decide whether to emit libcalls or
IR, down to only the atomic's size.
Summary:
The standard GNU atomic operations are a very common way to target
hardware atomics on the device. With more heterogenous devices being
introduced, the concept of memory scopes has been in the LLVM language
for awhile via the `syncscope` modifier. For targets, such as the GPU,
this can change code generation depending on whether or not we only need
to be consistent with the memory ordering with the entire system, the
single GPU device, or lower.
Previously these scopes were only exported via the `opencl` and `hip`
variants of these functions. However, this made it difficult to use
outside of those languages and the semantics were different from the
standard GNU versions. This patch introduces a `__scoped_atomic` variant
for the common functions. There was some discussion over whether or not
these should be overloads of the existing ones, or simply new variants.
I leant towards new variants to be less disruptive.
The scope here can be one of the following
```
__MEMORY_SCOPE_SYSTEM // All devices and systems
__MEMORY_SCOPE_DEVICE // Just this device
__MEMORY_SCOPE_WRKGRP // A 'work-group' AKA CUDA block
__MEMORY_SCOPE_WVFRNT // A 'wavefront' AKA CUDA warp
__MEMORY_SCOPE_SINGLE // A single thread.
```
Naming consistency was attempted, but it is difficult to capture to full
spectrum with no many names. Suggestions appreciated.
Update all callers to pass through the Address.
For the older builtins such as `__sync_*` and MSVC `_Interlocked*`,
natural alignment of the atomic access is _assumed_. This change
preserves that behavior. It will pass through greater-than-required
alignments, however.
This patch moves `ArraySizeModifier` before `Type` declaration so that it's complete at `ArrayTypeBitfields` declaration. It's also converted to scoped enum along the way.
- Update CodeGenTypeCache to use a single union for all pointers in
address space zero.
- Introduce a UnqualPtrTy in CodeGenTypeCache, and use that (for
example instead of llvm::PointerType::getUnqual) in some places.
- Drop some redundant bit/pointers casts from ptr to ptr.
Partial progress towards replacing `CreateElementBitCast`, as it no
longer does what its name suggests. Either replace its uses with
`Address::withElementType()`, or remove them if no longer needed.
Reviewed By: barannikov88, nikic
Differential Revision: https://reviews.llvm.org/D153314
Partial progress towards replacing in-tree uses of `Type::getPointerTo()`.
This needs to be done before deprecating the API.
Reviewed By: nikic, barannikov88
Differential Revision: https://reviews.llvm.org/D152321
LLVM IR already allows floating point type in atomicrmw.
Update clang atomic fetch max/min builtins to accept
floating point type like we did for fetch add/sub.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D150985
Fixes: SWDEV-401056
The rest of the fetch/op intrinsics were added in e13246a2ec3 but sub
was conspicuous by its absence.
Reviewed By: yaxunl
Differential Revision: https://reviews.llvm.org/D151701
MSVC allows interpreting volatile loads and stores, when combined with
/volatile:iso, as having acquire/release semantics. MSVC also exposes a
define, _ISO_VOLATILE, which allows users to enquire if this feature is
enabled or disabled.
To make uses of the deprecated constructor easier to spot, and to
ensure that no new uses are introduced, rename it to
Address::deprecated().
While doing the rename, I've filled in element types in cases
where it was relatively obvious, but we're still left with 135
calls to the deprecated constructor.
CreateElementBitCast() can preserve the pointer element type in
the presence of opaque pointers, so use it in place of CreateBitCast()
in some places. This also sometimes simplifies the code a bit.