The spec can be found at
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/74.
1. Add the new extension GroupID/Bitmask with latest hwprobe key.
2. Update the `initRISCVFeature `
3. Update `EmitRISCVCpuSupports` due to not only group0 now.
This PR adds the length intrinsic and an HLSL function that uses it.
The SPIRV implementation is left for a future PR.
This PR addresses #99134, though some SPIR-V changes still need to be
made to complete the task. Below is how this PR addresses #99134.
- "Implement `length` clang builtin" was done by defining `HLSLL ength`
in Builtins.td
- "Link `length` clang builtin with hlsl_intrinsics.h" was done by using
the alias attribute to make `length` an alias of
`__builtin_hlsl_elementwise_length` in hlsl_intrinsics.h
- "Add sema checks for `length` to `CheckHLSLBuiltinFunctionCall` in
`SemaChecking.cpp` " was done, but in this case not in SemaChecking.cpp,
rather SemaHLSL.cpp. A case was added to the builtin to check for
semantic failures, and set `TheCall` up to have the right return type.
- "Add codegen for `length` to `EmitHLSLBuiltinExpr` in `CGBuiltin.cpp`"
was done. For scalars, fabs is emitted, otherwise, length is emitted.
- "Add codegen tests to `clang/test/CodeGenHLSL/builtins/length.hlsl`
was done to test that `length` in HLSL emits the right intrinsic.
- "Add sema tests to `clang/test/SemaHLSL/BuiltIns/length-errors.hlsl`"
was done to test for diagnostics emitted in SemaHLSL.cpp
- "Create the `int_dx_length` intrinsic in `IntrinsicsDirectX.td`" was
done. Specifying return types and parameter types was difficult, but
`idot` was used for reference, and `llvm\include\llvm\IR\Intrinsics.td`
contains all the ways to express return / parameter types.
- "Create an intrinsic expansion of `int_dx_length` in
`llvm/lib/Target/DirectX/DXILIntrinsicExpansion.cpp`" was done, and was
mostly derived by looking at `TranslateLength` in `HLOperationLower.cpp`
in the DXC codebase.
- "Create the `length.ll` and `length_errors.ll` tests in
`llvm/test/CodeGen/DirectX/`" was done by taking the DXIL output of
`clang/test/CodeGenHLSL/builtins/length.hlsl` and running `opt -S
-dxil-intrinsic-expansion` and ` opt -S -dxil-op-lower` on it, checking
for how the length intrinsic was either expanded or lowered.
- "Create the `int_spv_length` intrinsic in `IntrinsicsSPIRV.td`" was
done by copying `IntrinsicsDirectX.td`.
---------
Co-authored-by: Justin Bogner <mail@justinbogner.com>
As with other loops, we need only look at a RecordDecl's FieldDecls.
Convert to using them. In the meantime, we can improve the generation of
the 'counted_by' FieldDecl's GEP by creating one GEP instead of a series
of GEPs.
Summary:
Currently there are several layers to handle `printf`. Since we now have
varargs and an implementation of `printf` this can be heavily
simplified.
1. The frontend renames `printf` into `omp_vprintf` and gives it an
argument buffer.
Removing 1. triggered some code in the AMDGPU backend menat for HIP /
OpenCL, so I hadded an exception to it.
2. Forward this to CUDA vprintf or ignore it.
We no longer need special handling for it since we have varargs. So now
we just forward this to CUDA vprintf if we have libc, otherwise just
leave `printf` as an external function and expect that `libc` will be
linked in.
The MMX instruction set is legacy, and the SSE2 variants are in every
way superior, when they are available -- and they have been available
since the Pentium 4 was released, 20 years ago.
Therefore, we are switching the "MMX" intrinsics to depend on SSE2,
unconditionally. This change entirely drops the ability to generate
vectorized code using compiler intrinsics for chips with MMX but without
SSE2: the Intel Pentium MMX, Pentium, II, and Pentium III (released
1997-1999), as well as AMD K6 and K7 series chips of around the same
timeframe. Targeting these older CPUs remains supported -- simply
without the ability to use MMX compiler intrinsics.
Migrating away from the use of MMX registers also fixes a rather
non-obvious requirement. The long-standing programming model for these
MMX intrinsics requires that the programmer be aware of the x87/MMX
mode-switching semantics, and manually call `_mm_empty()` between using
any MMX instruction and any x87 FPU instruction. If you neglect to, then
every future x87 operation will return a NaN result. This requirement is
not at all obvious to users of these these intrinsic functions, and
causes very difficult to detect bugs.
Worse, even if the user did write code that correctly calls
`_mm_empty()` in the right places, LLVM may sometimes reorder x87 and
mmx operations around each-other, unaware of this mode switching issue.
Eliminating the use of MMX registers eliminates this problem.
This change also deletes the now-unnecessary MMX `__builtin_ia32_*`
functions from Clang. Only 3 MMX-related builtins remain in use --
`__builtin_ia32_emms`, used by `_mm_empty`, and
`__builtin_ia32_vec_{ext,set}_v4si`, used by `_mm_insert_pi16` and
`_mm_extract_pi16`. Note particularly that the latter two lower to
generic, non-MMX, IR. Support for the LLVM intrinsics underlying these
removed builtins still remains, for the moment.
The file `clang/www/builtins.py` has been updated with mappings from the
newly-removed `__builtin_ia32` functions to the still-supported
equivalents in `mmintrin.h`.
(Originally uploaded at https://reviews.llvm.org/D86855 and
https://reviews.llvm.org/D94252)
Fixes issue #41665
Works towards #98272
This implements the __builtin_cpu_init and __builtin_cpu_supports
builtin routines based on the compiler runtime changes in
https://github.com/llvm/llvm-project/pull/85790.
This is inspired by https://github.com/llvm/llvm-project/pull/85786.
Major changes are a) a restriction in scope to only the builtins (which
have a much narrower user interface), and the avoidance of false
generality. This change deliberately only handles group 0 extensions
(which happen to be all defined ones today), and avoids the tblgen
changes from that review.
I don't have an environment in which I can actually test this, but @BeMg
has been kind enough to report that this appears to work as expected.
Before this can make it into a release, we need a change such as
https://github.com/llvm/llvm-project/pull/99958. The gcc docs claim that
cpu_support can be called by "normal" code without calling the cpu_init
routine because the init routine will have been called by a high
priority constructor. Our current compiler-rt mechanism does not do
this.
half and bfloat are common types for 16-bit elements. The support of
them was original there and dropped due to some reasons. This work adds
the support of the float types back.
This set of instructions was only supported by AMD chips starting in
the K6-2 (introduced 1998), and before the "Bulldozer" family
(2011). They were never much used, as they were effectively superseded
by the more-widely-implemented SSE (first implemented on the AMD side
in Athlon XP in 2001).
This is being done as a predecessor towards general removal of MMX
register usage. Since there is almost no usage of the 3DNow!
intrinsics, and no modern hardware even implements them, simple
removal seems like the best option.
(Clang half originally uploaded in https://reviews.llvm.org/D94213)
Works towards issue #41665 and issue #98272.
Add __hlt, which is a MSVC ARM64 intrinsic.
This intrinsic is just the HLT instruction. MSVC's version seems to
return something undefined; in this patch
it will just return zero.
MSVC intrinsics are defined here
https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics.
I used unsigned int as the return type, because that is what the MSVC
intrin.h header uses, even though
it conflicts with the documentation.
This enables the AMDGPU specific implementation of `printf` when
compiling for AMDGCN flavoured SPIR-V, the consequence being that the
expansion into ROCDL calls & friends gets expanded before "lowering" to
SPIR-V and gets carried through. The only relatively "novel" aspect is
that the `callAppendStringN` is simplified to take the type of the
passed in arguments, as opposed to querying them from the module. This
is a neutral change since the arguments were passed directly to the
call, without any attempt to cast them, hence the assumption that the
actual types match the formal ones was already baked in.
This patch
addresses a null pointer dereference issue reported by static analyzer
tool in the
`EmitSVETupleSetOrGet()` and `EmitSVETupleCreate()` functions.
Previously, the function
assumed that the result of `dyn_cast<>` to `ScalableVectorType` would
always be non-null,
which is not guaranteed.
The fix introduces a null check after the `dyn_cast<>` operation. If the
cast fails and
`SingleVecTy` is null, the function now returns `nullptr` to indicate an
error. This prevents the
dereference of a null pointer, which could lead to undefined behavior.
Additionally, the assert message has been corrected to accurately
reflect the expected
conditions.
These changes collectively enhance the robustness of the code by
ensuring type safety and preventing runtime errors due to improper type
casting.
These are incremental changes over #89217 , with core logic being the
same. This patch along with #89217 and #91190 should get us ready to enable 64
bit optimizations in atomic optimizer.
The builtin causes the program to stop its execution abnormally and
shows a human-readable description of the reason for the termination
when a debugger is attached or in a symbolicated crash log.
The motivation for the builtin is explained in the following RFC:
https://discourse.llvm.org/t/rfc-adding-builtin-verbose-trap-string-literal/75845
clang's CodeGen lowers the builtin to `llvm.trap` and emits debugging
information that represents an artificial inline frame whose name
encodes the category and reason strings passed to the builtin.
This patch is intended to be the first of a series with end goal to
adapt atomic optimizer pass to support i64 and f64 operations (along
with removing all unnecessary bitcasts). This legalizes 64 bit readlane,
writelane and readfirstlane ops pre-ISel
---------
Co-authored-by: vikramRH <vikhegde@amd.com>
This is a constant-expression equivalent to
ptrauth_sign_unauthenticated. Its constant nature lets us guarantee
a non-attackable sequence is generated, unlike
ptrauth_sign_unauthenticated which we generally discourage using.
It being a constant also allows its usage in global initializers, though
requiring constant pointers and discriminators.
The value must be a constant expression of pointer type which evaluates
to a non-null pointer.
The key must be a constant expression of type ptrauth_key.
The extra data must be a constant expression of pointer or integer type;
if an integer, it will be coerced to ptrauth_extra_data_t.
The result will have the same type as the original value.
This can be used in constant expressions.
Co-authored-by: John McCall <rjmccall@apple.com>
We should have done this for the f32/f64 case a long time ago. Now that
codegen handles atomicrmw selection for the v2f16/v2bf16 case, start emitting
it instead.
This also does upgrade the behavior to respect a volatile qualified pointer,
which was previously ignored (for the cases that don't have an explicit
volatile argument).
Relanding this PR now that
https://github.com/llvm/llvm-project/pull/90503 has merged. with `FTAN`
landing in
[TargetLoweringBase.cpp:L1021](https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/TargetLoweringBase.cpp#L1020C23-L1021C63
) There is now a llvm tan intrinsic 32\64\128 Expand case for all llvm
backends.
In LLVM, the `llvm.experimental.constrained.cos` and
`llvm.experimental.constrained.sin` intrinsics are used for performing
cosine and sine calculations with additional constraints on
floating-point operations. This behavior is expected for all
floating-point math intrinsics. This change adds these constraints for
the `tan` intrinsic.
- `Builtins.td` - replace TanF128 with F16F128MathTemplate
- `CGBuiltin.cpp` - map existing tan builtins to `tan` and
`constrained_tan` intrinsic
- `ConstrainedOps.def` map tan and constrained_tan to an ISDOpcode.
resolves #91421
---------
Co-authored-by: Farzon Lotfi <farzon@farzon.com>
This change seeks to add support for vendor flavoured SPIRV - more
specifically, AMDGCN flavoured SPIRV. The aim is to generate SPIRV that
carries some extra bits of information that are only usable by AMDGCN
targets, forfeiting absolute genericity to obtain greater expressiveness
for target features:
- AMDGCN inline ASM is allowed/supported, under the assumption that the
[SPV_INTEL_inline_assembly](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_inline_assembly.asciidoc)
extension is enabled/used
- AMDGCN target specific builtins are allowed/supported, under the
assumption that e.g. the `--spirv-allow-unknown-intrinsics` option is
enabled when using the downstream translator
- the featureset matches the union of AMDGCN targets' features
- the datalayout string is overspecified to affix both the program
address space and the alloca address space, the latter under the
assumption that the
[SPV_INTEL_function_pointers](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_function_pointers.asciidoc)
extension is enabled/used, case in which the extant SPIRV datalayout
string would lead to pointers to function pointing to the private
address space, which would be wrong.
Existing AMDGCN tests are extended to cover this new target. It is
currently dormant / will require some additional changes, but I thought
I'd rather put it up for review to get feedback as early as possible. I
will note that an alternative option is to place this under AMDGPU, but
that seems slightly less natural, since this is still SPIRV, albeit
relaxed in terms of preconditions & constrained in terms of
postconditions, and only guaranteed to be usable on AMDGCN targets (it
is still possible to obtain pristine portable SPIRV through usage of the
flavoured target, though).
In LLVM, the `llvm.experimental.constrained.cos` and
`llvm.experimental.constrained.sin` intrinsics are used for performing
cosine and sine calculations with additional constraints on
floating-point operations. This behavior is expected for all
floating-point math intrinsics. This change adds these constraints for
the `tan` intrinsic.
- `Builtins.td` - replace TanF128 with F16F128MathTemplate
- `CGBuiltin.cpp` - map existing tan builtins to `tan` and
`constrained_tan` intrinsic
- `ConstrainedOps.def` map tan and constrained_tan to an ISDOpcode.
- `ISDOpcodes.h` - define tan and strict tan opcodes
resolves #91421
All of these are inbounds as they access known offsets in fixed
globals. NFCI because constant expression construction currently
already infers this, this patch just makes it explicit.
This reuses most of the code that was created for f32x4 and f64x2 binary
instructions and tries to follow how they were implemented.
add/sub/mul/div - use regular LL instructions
min/max - use the minimum/maximum intrinsic, and also have builtins
pmin/pmax - use the wasm.pmax/pmin intrinsics and also have builtins
Specified at:
29a9b9462c/proposals/half-precision/Overview.md
Using MMRAs, allow `builtin_amdgcn_fence` to emit fences that only
target one or more address spaces, instead of fencing all address spaces
at once.
This is done through a `amdgpu-as` MMRA. Currently focused on OpenCL
fences, but can very easily support more AS names and codegen on more
than just fences.
Adds a builtin and intrinsic for the f16x8.splat instruction.
Specified at:
29a9b9462c/proposals/half-precision/Overview.md
Note: the current spec has f16x8.splat as opcode 0x123, but this is
incorrect and will be changed to 0x120 soon.