There are two related issues here. On the declaration/definition side,
we need to make sure the markings are conservative. Then on the caller
side, we need to make sure we don't access parameters that don't exist.
Fixes#187535.
Matrix loads and stores are accesses of their element types. Emit TBAA
nodes using their element type to allow more precise TBAA alias
analysis.
PR: https://github.com/llvm/llvm-project/pull/190029
This commit add the GetDimensions methods to Texture2D. For DXIL, it
requires intrinsics that are not yet available. They are added, but not
implemented.
Assisted-by: Gemini
Co-authored-by: Helena Kotas <hekotas@microsoft.com>
Add a missing OBTrapInvolved check before EmitIntegerSignChangeCheck().
This is considered "missing" as a previous attempt (https://github.com/llvm/llvm-project/pull/185772) to properly add an `__ob_trap` backdoor missed this particular instance.
This backdoor is needed because we want `__ob_trap` types to be picky about implicit conversions (including implicit sign change):
```c
unsigned int __ob_trap big = 4294967295;
(signed int)big; // should trap!
```
Move the `OBTrapInvolved` setup to the top of the function so it can be used in all the places we need it.
This adds the CalculateLevelOfDetail and CalculateLevelOfDetailUnclamped
methods to Texture2D using the establish pattern used for other methods.
Assisted-by: Gemini
Sometimes we use array of bytes to represent `_BitInt` types in memory.
When this is the case the lowered array filler expression reaches
`ConstantEmitter::emitForMemory` already with memory type which will be
array of i8 instead of a single iN, so `cast<llvm::ConstantInt>` was
failing within `ConstantEmitter::emitForMemory`. This patch fixes the
assertion failure by not attempting any type changes if the type is
right already.
Fixes https://github.com/llvm/llvm-project/issues/189643
Assisted-by: claude in FileCheck CHECK lines fixing
Implements isMemcpyEquivalentSpecialMember in CIR codegen so that
trivial copy/move constructors and defaulted union copy/move ops emit a
cir.copy directly instead of making a real constructor call. The logic
is shared with OG codegen by moving the implementation into ASTContext,
where it also gains the pointer field protection (PFP) check that was
previously missing in CIR.
The original attempt (#187051) produced a regression for
`intel-sycl-gpu` because `SPIRVEmitNonSemanticDI` will now self-activate
whenever `llvm.dbg.cu` is present. This removed the need for the
explicit `--spv-emit-nonsemantic-debug-info` flag.
The pass is now entered unconditionally for all SPIR-V targets, but
`NonSemantic.Shader.DebugInfo.100` requires the
`SPV_KHR_non_semantic_info`. Targets like `spirv64-intel` do not enable
that extension by default. When `checkSatisfiable()` ran on those
targets, it issued a fatal error rather than silently skipping.
Adds an early-out from `emitGlobalDI()`: if
`SPV_KHR_non_semantic_info` is not available for the current target, the
pass returns without emitting anything.
See discussion in #183347.
Added a separate test case rather than reusing
destructor-dead-on-return.cpp as we need to test functionality of the
deleting destructor which update_cc_test_checks.py does not add check
lines for.
Normally sane front-ends with the common calling-conventions avoid
having multiple sret with a return value, so this is NFCI. However,
multiple can be valid. This rewrites an odd looking DenseMap of one
element that was needed for iteration into a more sensible vector.
Noted in https://github.com/llvm/llvm-project/pull/181740 review.
This function made no sense at all. It was scanning through
the feature map looking for something that parsed as an OffloadArch.
Directly compute the arch from the target device.
I don't know why there isn't just an OffloadArch in TargetOpts,
this shouldn't really require parsing.
Now also mark the this pointer dead_on_return for classes with a
non-zero number of base classes. We saw a limited number of failures
internally due to this change, so it doesn't seem like there are too
many problems with real world deployment.
This change adds two builtins for AMDGPU:
- `__builtin_amdgcn_processor_is`, which is similar in observable
behaviour with `__builtin_cpu_is`, except that it is never "evaluated"
at run time;
- `__builtin_amdgcn_is_invocable`, which is behaviourally similar with
`__has_builtin`, except that it is not a macro (i.e. not evaluated at
preprocessing time).
Neither of these are `constexpr`, even though when compiling for
concrete (i.e. `gfxXXX` / `gfxXXX-generic`) targets they get evaluated
in Clang, so they shouldn't tear the AST too badly / at all for
multi-pass compilation cases like HIP. They can only be used in specific
contexts (as args to control structures).
The motivation for adding these is two-fold:
- as a nice to have, it provides an AST-visible way to incorporate
architecture specific code, rather than having to rely on macros and the
preprocessor, which burn in the choice quite early;
- as a must have, it allows featureful AMDGCN flavoured SPIR-V to be
produced, where target specific capability is guarded and chosen or
discarded when finalising compilation for a concrete target; this is
built atop the Speciali\ation Constant concept which is described in the
SPIR-V specification under section [2.12
Specialization](https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#_specialization_2)
I've tried to keep the overall footprint of the change small. The
changes to Sema are a bit unpleasant, but there was a strong desire to
have Clang validate these, and to constrain their uses, and this was the
most compact solution I could come up with (suggestions welcome).
---------
Co-authored-by: Juan Manuel Martinez Caamaño <jmartinezcaamao@gmail.com>
Co-authored-by: Voicu <avoicu@amd.com>
In LTO, part of LLVM's middle-end runs after linking has finished. LTO's
semantics depend on the complete set of extracted bitcode files being
known at this time. If the middle-end inserts new calls to library
functions (libfuncs) that are implemented in bitcode, this could extract
new bitcode object files into the link. These cannot be compiled,
leading to undefined symbol references.
Additionally, the middle-end in LTO may reason that such library
functions have no references, and it may internalize them, then
manipulate their API or even delete them. Afterwards, it may emit a call
to them, again producing undefined symbol references.
This patch resolves the former issue by ensuring that the middle end
emits no new references to symbols defined in bitcode, and it resolves
the latter issue by ensuring that extracted bitcode for libfuncs is
considered external, since new calls may be emitted to them at any time.
The new semantics are not yet established for MachO LLD, which does not
yet appear to have any special handling for libcalls in LTO. It also
does not yet support distributed ThinLTO; doing so would require
additional (de)serialization work.
This is the patch referenced in @ilovepi's and my talk at the last LLVM
devmeeting: "LT-Uh-Oh"
Gemini 3.1 was used in porting to COFF and WASM LLDs.
## TL;DR
This is a stack of PRs implementing features to expose direct methods
ABI.
You can see the RFC, design, and discussion
[here](https://discourse.llvm.org/t/rfc-optimizing-code-size-of-objc-direct-by-exposing-function-symbols-and-moving-nil-checks-to-thunks/88866).
https://github.com/llvm/llvm-project/pull/170616 Flag
`-fobjc-direct-precondition-thunk` set up
https://github.com/llvm/llvm-project/pull/170617 Code refactoring to
ease later reviews
https://github.com/llvm/llvm-project/pull/170618 **Thunk generation**
https://github.com/llvm/llvm-project/pull/170619 Optimizations, some
class objects can be known to be realized
## Implementation details
### Dispatching
- `GetDirectMethodCallee` handles the dispatching logic. Previously we
only need to call `GenerateDirectMethod` to get the declaration of a
direct method.
- `GenerateDirectMethod` first attempts to acquire the declaration of
the implementation, and return it if the flag is not set.
- Generate and return thunk if we can't dispatch to true implementation
(i.e. we can't reason receiver is def not null or class object is not
realized)
### Precondition check thunk generation
- `GenerateObjCDirectThunk` generates the thunk, it is called on demand
by `GetDirectMethodCallee`
- Thunk inherits all attributes from the true implementation, see
`StartObjCDirectThunk` for more detail.
- `StartObjCDirectThunk` and `FinishObjCDirectThunk` follows the design
pattern of `StartThunk` and `FinishThunk` in CGVTable.
### Precondition check inline generation
- If the function need to have precondition check inlined
(`shouldHaveNilCheckInline`), caller will emit the nil check during
`EmitMessageSend`
- Class realization is generated inline
- No extra nil check is generated - we reuse `NullReturnState` to emit
the nil check for us, it already emits nil check at caller side to
handle `ns_consumed`, we just need to tell `NullReturnState` to do the
work by setting the flag `RequiresNullCheck |= ReceiverCanBeNull;`
### Visibility and linkage
- Visibility is still by default `Hidden`. But `StartObjCMethod` will
now respect source level visibility attributes so methods with
`__attribute((visibility("default"))` can be used in other linking units
- Linkage is by default `External`
## Tests
- `expose-direct-method.m` follow the example of `direct-method.m`
- `direct-method-ret-mismatch.m` make sure we can handle the corner case
- `expose-direct-method-consumed.m ` and
`expose-direct-method-linkedlist.m` executable test on Mac only to
validate ARC correctness
- `expose-direct-method-varargs.m`
- `expose-direct-method-visibility-linkage.m`
BasicBlock::getTerminator() is frequently called on valid IR, yet the
function has to check that the last instruction is in fact a terminator,
even in release builds. This check can only be optimized away when the
instruction is dereferenced.
Therefore, introduce the functions hasTerminator() and
getTerminatorOrNull() as replacement and require (assert) that
getTerminator() always returns a valid terminator. As a side effect,
this forces explicit expression of intent at call sites when unfinished
basic blocks should be supported.
This removes dyn_cast invocations where the argument is already of the
target type (including through subtyping). This was created by adding a
static assert in dyn_cast and letting an LLM iterate until the code base
compiled. I then went through each example and cleaned it up. This does
not commit the static assert in dyn_cast, because it would prevent a lot
of uses in templated code. To prevent backsliding we should instead add
an LLVM aware version of
https://clang.llvm.org/extra/clang-tidy/checks/readability/redundant-casting.html
(or expand the existing one).
The code generated for calls with FPCC eligible structs as arguments
doesn't consider the bitfield, which results in a store crossing the
boundary of the memory allocated using alloca, e.g.
For the code:
```
struct __attribute__((packed, aligned(1))) S {
const float f0;
unsigned f1 : 1;
};
unsigned func(struct S arg)
{
return arg.f1;
}
```
The generated IR is:
```
define dso_local signext i32 @func(
float [[TMP0:%.*]], i32 [[TMP1:%.*]]) #[[ATTR0:[0-9]+]] {
[[ENTRY:.*:]]
[[ARG:%.*]] = alloca [[STRUCT_S:%.*]], align 1
[[TMP2:%.*]] = getelementptr inbounds nuw { float, i32 }, ptr [[ARG]], i32 0, i32 0
store float [[TMP0]], ptr [[TMP2]], align 1
[[TMP3:%.*]] = getelementptr inbounds nuw { float, i32 }, ptr [[ARG]], i32 0, i32 1
store i32 [[TMP1]], ptr [[TMP3]], align 1
[[F1:%.*]] = getelementptr inbounds nuw [[STRUCT_S]], ptr [[ARG]], i32 0, i32 1
[[BF_LOAD:%.*]] = load i8, ptr [[F1]], align 1
[[BF_CLEAR:%.*]] = and i8 [[BF_LOAD]], 1
[[BF_CAST:%.*]] = zext i8 [[BF_CLEAR]] to i32
ret i32 [[BF_CAST]]
```
Where, `store i32 [[TMP1]], ptr [[TMP3]], align 1` can be seen crossing
the boundary of the allocated memory. If, the IR is seen after
optimizations (EarlyCSEPass), the IR left is:
```
define dso_local noundef signext i32 @func(
float [[TMP0:%.*]], i32 [[TMP1:%.*]]) local_unnamed_addr #[[ATTR0:[0-9]+]] {
[[ENTRY:.*:]]
ret i32 0
```
The patch trims the second member of the struct after taking into
consideration the bitwidth to decide the appropriate integer type and
the test shows the results of this patch.
Note that the bug is seen only when `f` extension is enabled for FPCC
eligibility.
Co-authored-by: muhammad.kamran4 <muhammad.kamran@esperantotech.com>
When functions marked with `[[gnu::warning/error]]` are called through inlined functions, we now show the inlining chain that led to the call when ``-fdiagnostics-show-inlining-chain`` is enabled.
With this flag, two modes are possible:
- **heuristic** mode: Uses `srcloc` and `inlined.from` metadata to reconstruct the inlining chain. Functions that are `inline`, `static`, `always_inline`, or in anonymous namespaces get `srcloc` metadata attached. This mode emits a note suggesting `-gline-directives-only` for more accurate locations.
- **debug** mode: Automatically used instead of heuristic when building with at least `-gline-directives-only` (implied by `-g1` or higher). Leverages `DILocation` debug info for reliable source locations.
Fixes: https://github.com/ClangBuiltLinux/linux/issues/1571
Make sure pattern exclusions have priority over the overflow behavior types when deciding whether or not to emit truncation checks.
Accomplish this by carrying an extra field through `ScalarConversionOpts` which we later check before emitting instrumentation.
Resolves: https://github.com/llvm/llvm-project/issues/164150
C++26 allows for constexpr packs in structured bindings. This is a new
feature (the code doesn't compile on lower the -std=c++26) and so was
previously unhandled in clang.
This makes clang aware of packs and handle them as one constant unit
instead of materializing them as separate mutable reference temporaries
allowing llvm to optimize them.
This turns the example code from the issue into this as you would expect
without compiling for zen 5 (the good codegen described).
```asm
movq %rdi, %rax
movups (%rsi), %xmm0
movups %xmm0, (%rdi)
movups (%rdx), %xmm0
movups %xmm0, 16(%rdi)
retq
```
Adds the `GroupMemoryBarrier()` HLSL function to SPIRV and DirectX with
additional tests for the different backends.
When this moves in, will create another PR with this as a template for
the other Barriers:
- `AllMemoryBarrier()` #99076
- `AllMemoryBarrierWithGroupSync()` #99090
- `DeviceMemoryBarrier()` #99105
- `DeviceMemoryBarrierWithGroupSync()` #99106
`Barrier()` does not have support for SPIRV, so I will exclude that from
the next PR.
- [x] Implement GroupMemoryBarrier clang builtin,
- [x] Link GroupMemoryBarrier clang builtin with hlsl_intrinsics.h
- [x] Add sema checks for GroupMemoryBarrier to
CheckHLSLBuiltinFunctionCall in SemaChecking.cpp
- [x] Add codegen for GroupMemoryBarrier to EmitHLSLBuiltinExpr in
CGBuiltin.cpp
- [x] Add codegen tests to
clang/test/CodeGenHLSL/builtins/GroupMemoryBarrier.hlsl
- [x] Add sema tests to
clang/test/SemaHLSL/BuiltIns/GroupMemoryBarrier-errors.hlsl
- [x] Create the int_dx_GroupMemoryBarrier intrinsic in
IntrinsicsDirectX.td
- [x] Create the DXILOpMapping of int_dx_GroupMemoryBarrier to 80 in
DXIL.td
- [x] Create the GroupMemoryBarrier.ll and GroupMemoryBarrier_errors.ll
tests in llvm/test/CodeGen/DirectX/
- [x] Create the int_spv_GroupMemoryBarrier intrinsic in
IntrinsicsSPIRV.td
- [x] In SPIRVInstructionSelector.cpp create the GroupMemoryBarrier
lowering and map it to int_spv_GroupMemoryBarrier in
SPIRVInstructionSelector::selectIntrinsic.
- [x] Create SPIR-V backend test case in
llvm/test/CodeGen/SPIRV/hlsl-intrinsics/GroupMemoryBarrier.ll
<!-- branch-stack-start -->
<!-- branch-stack-end -->
The existing implementation has three issues which this patch addresses.
1. The last dimension which represents the bytes in the type, has the
wrong stride and count. For example, for a 4 byte int, count=1 and
stride=4. The correct representation here is count=4 and stride=1
because there are 4 bytes (count=4) that we need to copy and we do not
skip any bytes (stride=1).
2. The size of the data copy was computed using the last dimension.
However, this is incorrect in cases where some of the final dimensions
get merged into one. In this case we need to take the combined size of
the merged dimensions, which is (Count * Stride) of the first merged
dimension.
3. The Offset into a dimension was computed as a multiple of its Stride.
However, this Stride which is in bytes, already includes the stride
multiplier given by the user. This means that when the user specified
1:3:2, i.e. elements 1, 3, 5, the runtime incorrectly copied elements 2,
4, 6. Fix this by precomputing at compile time the Offset to be in bytes
by correctly multiplying the offset by the stride of the dimension
without the user-specified multiplier.
Add the visibility override in setGlobalVisibility(), following the
existing OpenMP precedent. Unlike the AMDGPU post-hoc override, this
check respects explicit [[gnu::visibility("hidden")]] attributes
via isVisibilityExplicit().
When targeting runtimes that support constant literal classes, emit ObjC
literal expressions @(number), @[], and @{} as compile-time constant
data structures rather than runtime msgSend calls. This reduces code
size and runtime overhead at the cost of increased data segment size,
and avoids repeated heap allocation of equivalent literal objects.
The feature is not supported with the fragile ABI or GNU runtimes, where
it is automatically disabled.
The feature can be disabled altogether with -fno-objc-constant-literals,
or individually per literal kind:
-fno-constant-nsnumber-literals
-fno-constant-nsarray-literals
-fno-constant-nsdictionary-literals
Custom backing class names can be specified via:
-fconstant-array-class=<name>
-fconstant-dictionary-class=<name>
-fconstant-integer-number-class=<name>
-fconstant-float-number-class=<name>
-fconstant-double-number-class=<name>
rdar://45380392
rdar://168106035
---------
Co-authored-by: Ben D. Jones <bendjones@apple.com>
This PR adds QuadReadAcrossY intrinsic support in HLSL with codegen for
both DirectX and SPIRV backends. Resolves
https://github.com/llvm/llvm-project/issues/99176.
- [x] Implement `QuadReadAcrossY` clang builtin,
- [x] Link `QuadReadAcrossY` clang builtin with `hlsl_intrinsics.h`
- [x] Add sema checks for `QuadReadAcrossY` to
`CheckHLSLBuiltinFunctionCall` in `SemaChecking.cpp`
- [x] Add codegen for `QuadReadAcrossY` to `EmitHLSLBuiltinExpr` in
`CGBuiltin.cpp`
- [x] Add codegen tests to
`clang/test/CodeGenHLSL/builtins/QuadReadAcrossY.hlsl`
- [x] Add sema tests to
`clang/test/SemaHLSL/BuiltIns/QuadReadAcrossY-errors.hlsl`
- [x] Create the `int_dx_QuadReadAcrossY` intrinsic in
`IntrinsicsDirectX.td`
- [x] Create the `DXILOpMapping` of `int_dx_QuadReadAcrossY` to `123` in
`DXIL.td`
- [x] Create the `QuadReadAcrossY.ll` and `QuadReadAcrossY_errors.ll`
tests in `llvm/test/CodeGen/DirectX/`
- [x] Create the `int_spv_QuadReadAcrossY` intrinsic in
`IntrinsicsSPIRV.td`
- [x] In SPIRVInstructionSelector.cpp create the `QuadReadAcrossY`
lowering and map it to `int_spv_QuadReadAcrossY` in
`SPIRVInstructionSelector::selectIntrinsic`.
- [x] Create SPIR-V backend test case in
`llvm/test/CodeGen/SPIRV/hlsl-intrinsics/QuadReadAcrossY.ll`
This adds two related changes to HLSL debug info support in the SPIR-V
backend. It's a first small step towards the plan I described in
https://discourse.llvm.org/t/hlsl-spirv-nsdi-debug-info-support-for-clang-dxc/90149.
## Tag HLSL shaders with `DW_LANG_HLSL` in the front-end
`GetSourceLanguage()` in `clang/lib/CodeGen/CGDebugInfo.cpp` checked
`LO.CPlusPlus` before `LO.HLSL`. Since HLSL is compiled as C++, the HLSL
check was never reached. Shaders compiled with `-g` were tagged with
`DW_LANG_C_plus_plus_14` instead of `DW_LANG_HLSL`. The NSDI pass
already had the correct mapping for `DW_LANG_HLSL` but it was never
triggered.
This fixes#136929 and #136995.
## Make `SPIRVEmitNonSemanticDI` activate automatically when `-g` is
used
`SPIRVPassConfig::addPreEmitPass()` only scheduled
`SPIRVEmitNonSemanticDI` when `--spv-emit-nonsemantic-debug-info` was
set or the target vendor was AMD. Passing `-g` to clang had no effect on
the SPIR-V backend pass.
The pass is now added unconditionally and self-activates by checking for
`llvm.dbg.cu` in the module. When no debug metadata is present it exits
early with no effect. This avoids the need to inspect module metadata at
pass-configuration time, which is not reliably available.
`--spv-emit-nonsemantic-debug-info` is now a deprecated synonym for
`-g`.
The alternative to the unconditional pass approach is to check at
pass-configuration time whether the module was compiled with debug info
(e.g. via `TargetOptions::DebugInfoForProfiling` or a similar flag
forwarded from the driver). I went with the unconditional approach
because it is simpler and the pass is cheap to enter and exit when no
`llvm.dbg.cu` is present.
I'm not sure whether adding a pass unconditionally is acceptable. Does
this sound reasonable, or would it be better to implement the
flag-forwarding approach?
Changes to tests:
- `clang/test/CodeGenHLSL/` (new): verifies that `-g` on an HLSL SPIR-V
target produces `DebugCompilationUnit` with language code 5
(`DW_LANG_HLSL`).
-
`llvm/test/CodeGen/SPIRV/debug-info/hlsl-debug-info-auto-activation.ll`
(new): verifies that a module with `llvm.dbg.cu` and `DW_LANG_HLSL`
produces `DebugCompilationUnit` without
`--spv-emit-nonsemantic-debug-info`.
- Existing `debug-compilation-unit.ll`, `debug-type-basic.ll`,
`debug-type-pointer.ll`: updated to verify NSDI is emitted whenever
debug metadata is present.
- `llc-pipeline.ll`: updated to reflect that `SPIRVEmitNonSemanticDI` is
now always in the pipeline.
---------
Co-authored-by: Eric Christopher <echristo@gmail.com>
When targeting arm64e, vtable pointers are signed with a discriminator
that incorporates the object's address
(PointerAuthVTPtrAddressDiscrimination) and class type
(PointerAuthVTPtrTypeDiscrimination).
I had to make a small change to clang, specifically in
getPointerAuthDeclDiscriminator(). Previously, that was computing the
discriminator based on getMangledName(). The latter returns the
AsmLabelAttr, which for functions imported by lldb, is prefixed with
`$__lldb_func`, causing a different discriminator to be generated.
llvm.loop.licm.disable is already availabe at LLVM-IR level to disable
LICM per loop. This PR simply exposes that capability to the developers
at clang level.
This attribute is similar to the already implemented ext_builtin_input
attribute.
One important bit is the `static` storage class: HLSL uses static
differently than C/C++. This is a known weirdness:
See https://github.com/microsoft/hlsl-specs/issues/350
In C/C++, when we declare a variable as 'extern', we often expect
another module to declare the symbole. In HLSL, the pipeline will
'declare' the symbol. Hence in this case, we need to emit the global
variable.
Related WG-HLSL:
https://github.com/llvm/wg-hlsl/blob/main/proposals/0031-semantics.md
---------
Co-authored-by: Steven Perron <stevenperron@google.com>
Clang now constructs calls to it using the default program address space from the DataLayout.
Co-authored-by: Alex Richardson <alexrichardson@google.com>