This adds support for all the surface read and write calls to clang. It
extends the pattern used for textures to surfaces too.
I tested this by generating all the various permutations of the calls
and argument types in a python script, compiling them with both clang
and nvcc, and comparing the generated ptx for equivilence. They all
agree, ignoring register allocation, and some places where Clang picks
different memory write instructions. An example kernel is:
```
__global__ void testKernel(cudaSurfaceObject_t surfObj, int x, float2* result) {
*result = surf1Dread<float2>(surfObj, x, cudaBoundaryModeZero);
}
```
---------
Signed-off-by: Austin Schuh <austin.linux@gmail.com>
With -fpatchable-function-entry (or the patchable_function_entry
function attribute), we emit records of patchable entry locations to the
__patchable_function_entries section. Add an additional parameter to the
command line option that allows one to specify a different default
section name for the records, and an identical parameter to the function
attribute that allows one to override the section used.
The main use case for this change is the Linux kernel using prefix NOPs
for ftrace, and thus depending on__patchable_function_entries to locate
traceable functions. Functions that are not traceable currently disable
entry NOPs using the function attribute, but this creates a
compatibility issue with -fsanitize=kcfi, which expects all indirectly
callable functions to have a type hash prefix at the same offset from
the function entry.
Adding a section parameter would allow the kernel to distinguish between
traceable and non-traceable functions by adding entry records to
separate sections while maintaining a stable function prefix layout for
all functions. LKML discussion:
https://lore.kernel.org/lkml/Y1QEzk%2FA41PKLEPe@hirez.programming.kicks-ass.net/
This patch introduces the `vmem-to-lds-load-insts` target feature, which
can be used to enable builtins `__builtin_amdgcn_global_load_lds` and
`__builtin_amdgcn_raw_ptr_buffer_load_lds` on platforms which have this
feature.
This feature is only available on gfx9/10.
A limitation of using a common target feature for both builtins is that
we could have made `__builtin_amdgcn_raw_ptr_buffer_load_lds` available
on gfx6,7,8.
For btf_type_tag implementation, in order to have the same results with
clang (__attribute__((btf_type_tag("...")))), gcc intends to use c2x
syntax '[[...]]'. Clang also supports similar c2x syntax. Currently, the
clang selftest contains the following five tests:
```
attr-btf_type_tag-func.c
attr-btf_type_tag-similar-type.c
attr-btf_type_tag-var.c
attr-btf_type_tag-func-ptr.c
attr-btf_type_tag-typedef-field.c
```
Tests attr-btf_type_tag-func.c and attr-btf_type_tag-var.c already have
c2x syntax test.
Test attr-btf_type_tag-func-ptr.c does not support c2x syntax when
'__attribute__((...))' is replaced with with '[[...]]'. This should not
be an issue since we do not have use cases for function pointer yet.
This patch added '[[...]]' syntax for
```
attr-btf_type_tag-similar-type.c
attr-btf_type_tag-typedef-field.c
```
The combination of `-fcomplex-arithmetic=promoted` and `mno-x87` for
`double` complex division is leading to a crash.
See https://godbolt.org/z/189G957oY
This patch fixes that.
Implement all single-multi {BF/F/S/U/SU/US}MOP4{A/S} instructions in
clang and llvm following the acle in
https://github.com/ARM-software/acle/pull/381/files.
This PR depends on https://github.com/llvm/llvm-project/pull/127797
This patch updates the semantics of template arguments in intrinsic
names for clarity and ease of use. Previously, template argument numbers
indicated which character in the prototype string determined the final
type suffix, which was confusing—especially for intrinsics using
multiple prototype modifiers per operand (e.g., intrinsics operating on
arrays of vectors). The number had to reference the correct character in
the prototype (e.g., the ‘u’ in “2.u”), making the system cumbersome and
error-prone.
With this patch, template argument numbers now refer to the operand
number that determines the final type suffix, providing a more intuitive
and consistent approach.
Currently arm_neon.h emits C-style casts to do vector type casts. This
relies on implicit conversion between vector types to be enabled, which
is currently deprecated behaviour and soon will disappear. To ensure
NEON code will keep working afterwards, this patch changes all this
vector type casts into bitcasts.
Co-authored-by: Momchil Velikov <momchil.velikov@arm.com>
Certain functions require the `returns_twice` attribute in order to
produce correct codegen. However, `-fno-builtin` removes all knowledge
of functions that require this attribute, so this PR modifies Clang to
add the `returns_twice` attribute even if `-fno-builtin` is set. This
behavior is also consistent with what GCC does.
It's not (easily) possible to get the builtin information from
`Builtins.td` because `-fno-builtin` causes Clang to never initialize
any builtins, so functions never get tokenized as functions/builtins
that require `returns_twice`. Therefore, the most straightforward
solution is to explicitly hard code the function names that require
`returns_twice`.
Fixes#122840
Summary:
When we were first porting to COV5, this lead to some ABI issues due to
a change in how we looked up the work group size. Bitcode libraries
relied on the builtins to emit code, but this was changed between
versions. This prevented the bitcode libraries, like OpenMP or libc,
from being used for both COV4 and COV5. The solution was to have this
'none' functionality which effectively emitted code that branched off of
a global to resolve to either version.
This isn't a great solution because it forced every TU to have this
variable in it. The patch in
https://github.com/llvm/llvm-project/pull/131033 removed support for
COV4 from OpenMP, which was the only consumer of this functionality.
Other users like HIP and OpenCL did not use this because they linked the
ROCm Device Library directly which has its own handling (The name was
borrowed from it after all).
So, now that we don't need to worry about backward compatibility with
COV4, we can remove this special handling. Users can still emit COV4
code, this simply removes the special handling used to make the OpenMP
device runtime bitcode version agnostic.
Relands #132907 with a fix in the testcase:
clang/test/CodeGen/Mips/subtarget-feature-test.c
enable this test for only mips64 target
PR #130587 defined same SubTargetFeature for CPUs i6400 and i6500 which
resulted into following warning when -mcpu=i6500 was used:
+i6500' is not a recognized feature for this target (ignoring feature)
This PR fixes above issue by defining separate SubTargetFeature for
i6500.
PR #130587 defined same SubTargetFeature for CPUs i6400 and i6500 which
resulted into following warning when -mcpu=i6500 was used:
+i6500' is not a recognized feature for this target (ignoring feature)
This PR fixes above issue by defining separate SubTargetFeature for
i6500.
Removes attr-target-version.c which doesn't have a clear purpose.
Introduces AArch64/fmv-detection.c to check detection bitmasks.
Adds coverage in AArch64/fmv-resolver-emission.c
This reverts commit 0dba5381d8c8e4cadc32a067bf2fe5e3486ae53d.
YMM rounding was removed from AVX10 whitepaper. Ref:
https://cdrdv2.intel.com/v1/dl/getContent/784343
The MINMAX and SATURATING CONVERT instructions will be removed as a
follow up.
- Add '_' after cvt[t]s intrinsics when 's' is for saturation;
- Add 's_' for all ipcvt[t] intrinsics since they are all saturation
ones;
- Move 's' after 'cvt' and add '_' after it for prior `biass`
intrinsics;
This is to solve potential confusion since 's' before a type usually
represents for scalar.
Synced with GCC folks and they will change in the same way.
When `-fcomplex-arithmetic=promoted` is set complex divassign `/=` should
promote to a wider type the same way division (without assignment) does.
Prior to this change, Smith's algorithm would be used for divassign.
Fixes: https://github.com/llvm/llvm-project/issues/131129
- Add tests for complex divdent and real divisor
- Add tests for complex * real multiplication
- Add tests for multiply/divide and assign (`/=`,`*=`) operators
This builtin is supported by GCC and is a way to improve diagnostic
behavior for va_start in C23 mode. C23 no longer requires a second
argument to the va_start macro in support of variadic functions with no
leading parameters. However, we still want to diagnose passing more than
two arguments, or diagnose when passing something other than the last
parameter in the variadic function.
This also updates the freestanding <stdarg.h> header to use the new
builtin, same as how GCC works.
Fixes#124031
Clang has __builtin_elementwise_exp and __builtin_elementwise_exp2
intrinsics, but no __builtin_elementwise_exp10.
There doesn't seem to be a good reason not to expose the exp10 flavour
of this intrinsic too.
This commit introduces this intrinsic following the same pattern as the
exp and exp2 versions.
Fixes: SWDEV-519541
The parantheses are unnecessary IMO because they should have been
handled in the parents of such expressions, e.g. in CXXFunctionalCastExpr.
Moreover, we shouldn't join CXXDefaultInitExpr either because they are
not printed at all.
Reverts
c40f0fe434
Original description:
This updates the existing modf[f|l] builtin to be lowered via the
llvm.modf.* intrinsic (rather than directly to a library call).
The Windows 32-bit x86 missing `modff` symbol issue should have been
solved in: https://github.com/llvm/llvm-project/pull/130636.
This implements the R2 semantics of P0963.
The R1 semantics, as outlined in the paper, were introduced in Clang 6.
In addition to that, the paper proposes swapping the evaluation order of
condition expressions and the initialization of binding declarations
(i.e. std::tuple-like decompositions).
This broke modff calls on 32-bit x86 Windows. See comment on the PR.
> This updates the existing modf[f|l] builtin to be lowered via the
> llvm.modf.* intrinsic (rather than directly to a library call).
>
> The legalization issues exposed by the original PR (#126750) should have
> been fixed in #128055 and #129264.
This reverts commit cd1d9a8fab05524a27ffdb251f6def37786b5cc1.
FPSCR and FPEXC will be stored in FPStatusRegs, after GPRCS2 has been
saved.
- GPRCS1
- GPRCS2
- FPStatusRegs (new)
- DPRCS
- GPRCS3
- DPRCS2
FPSCR is present on all targets with a VFP, but the FPEXC register is
not present on Cortex-M devices, so different amounts of bytes are
being pushed onto the stack depending on our target, which would
affect alignment for subsequent saves.
DPRCS1 will sum up all previous bytes that were saved, and will emit
extra instructions to ensure that its alignment is correct. My
assumption is that if DPRCS1 is able to correct its alignment to be
correct, then all subsequent saves will also have correct alignment.
Avoid annotating the saving of FPSCR and FPEXC for functions marked
with the interrupt_save_fp attribute, even though this is done as part
of frame setup. Since these are status registers, there really is no
viable way of annotating this. Since these aren't GPRs or DPRs, they
can't be used with .save or .vsave directives. Instead, just record
that the intermediate registers r4 and r5 are saved to the stack
again.
Co-authored-by: Jake Vossen <jake@vossen.dev>
Co-authored-by: Alan Phipps <a-phipps@ti.com>
Use the regular code paths for interpreting.
Add new instructions: `StartSpeculation` will reset the diagnostics
pointers to `nullptr`, which will keep us from reporting any diagnostics
during speculation. `EndSpeculation` will undo this.
The rest depends on what `Emitter` we use.
For `EvalEmitter`, we have no bytecode, so we implement `speculate()` by
simply visiting the first argument of `__builtin_constant_p`. If the
evaluation fails, we push a `0` on the stack, otherwise a `1`.
For `ByteCodeEmitter`, add another instrucion called `BCP`, that
interprets all the instructions following it until the next
`EndSpeculation` instruction. If any of those instructions fails, we
jump to the `EndLabel`, which brings us right before the
`EndSpeculation`. We then push the result on the stack.