Recently, we have added a set of complex intrinsics on
the TMA, tcgen05, and Cvt family of instructions.
This patch captures the key learnings from our experience
so far and documents them as guidelines for future design.
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
createCast in MergeFunctions did not consider ArrayTypes, which results
in the creation of a bitcast between ArrayTypes in the thunk function,
leading to an assertion failure in the provided test case.
The version of createCast in GlobalMergeFunctions does handle
ArrayTypes, so this common code has been factored out into the
IRBuilder.
2b11c7de4ae182496438e166cb6758d41b6e1740 introduced
`llvm/include/llvm/MC/MCAsmLexer.h` and made `AsmLexer` inherit from
`MCAsmLexer`, likely to allow target-specific parsers to depend solely
on `MCAsmLexer`. However, this separation now seems unnecessary and
confusing.
`MCAsmLexer` defines virtual functions with `AsmLexer` as its only
implementation, and `AsmLexer` itself has few extra public methods.
To simplify the codebase, this change merges MCAsmLexer.{h,cpp} into
AsmLexer.{h,cpp}. MCAsmLexer.h is temporarily kept as a forwarder.
Note: I doubt that a downstream lexer handling an assembly syntax
significantly different from the standard GNU Assembler syntax would
want to inherit from `MCAsmLexer`. Instead, it's more likely they'd
extend `AsmLexer` by adding new states and modifying its internal logic,
as seen with variables for MASM, M68k, and HLASM.
Resolves#99221
Key points: For SPIRV backend, it decompose into a `dot` followed a
`add`.
- [x] Implement dot2add clang builtin,
- [x] Link dot2add clang builtin with hlsl_intrinsics.h
- [x] Add sema checks for dot2add to CheckHLSLBuiltinFunctionCall in
SemaHLSL.cpp
- [x] Add codegen for dot2add to EmitHLSLBuiltinExpr in CGBuiltin.cpp
- [x] Add codegen tests to clang/test/CodeGenHLSL/builtins/dot2add.hlsl
- [x] Add sema tests to clang/test/SemaHLSL/BuiltIns/dot2add-errors.hlsl
- [x] Create the int_dx_dot2add intrinsic in IntrinsicsDirectX.td
- [x] Create the DXILOpMapping of int_dx_dot2add to 162 in DXIL.td
- [x] Create the dot2add.ll and dot2add_errors.ll tests in
llvm/test/CodeGen/DirectX/
This PR makes verification of .debug_names acceleration table
multithreaded. In local testing it improves verification of clang
.debug_names from four minutes to under a minute.
This PR relies on a current mechanism of extracting DIEs into a vector.
Future improvements can include creating API to extract one DIE at a
time, or grouping Entires into buckets by CUs and extracting before
parallel step.
Single Thread
4:12.37 real, 246.88 user, 3.54 sys, 0 amem,10232004 mmem
Multi Thread
0:49.40 real, 612.84 user, 515.73 sys, 0 amem, 11226292 mmem
ISD::ADDC, ISD::ADDE, ISD::SUBC and ISD::SUBE are being deprecated,
using ISD::UADDO_CARRY,ISD::USUBO_CARRY instead. Lowering the UADDO,
UADDO_CARRY, USUBO, USUBO_CARRY in the patch.
This reapplies #132522.
Previously casts of scalable m_ImmConstant splats weren't being folded
by ConstantFoldCastOperand, triggering the "Constant-fold of ImmConstant
should not fail" assertion.
There are no changes to the code in this PR, instead we just needed
#133207 to land first.
A test has been added for the assertion in
llvm/test/Transforms/InstSimplify/vec-icmp-of-cast.ll
@icmp_ult_sext_scalable_splat_is_true.
<hr/>
#118806 fixed an infinite loop in FoldShiftByConstant that could occur
when the shift amount was a ConstantExpr.
However this meant that FoldShiftByConstant no longer kicked in for
scalable vectors because scalable splats are represented by
ConstantExprs.
This fixes it by allowing scalable splats of non-ConstantExprs in
m_ImmConstant, which also fixes a few other test cases where scalable
splats were being missed.
But I'm also hoping that UseConstantIntForScalableSplat will eventually
remove the need for this.
I noticed this when trying to reverse a combine on RISC-V in #132245,
and saw that the resulting vector and scalar forms were different.
This patch changes how ZT0 table is modelled at LLVM-IR level. Currently
accesses to ZT0 are represented at LLVM-IR level as memory reads and
writes. This patch changes that and models them as purely Inaccessible
memory accesses without any unmodeled side-effects.
I believe this method was not supposed to be public, as it has
additional preconditions (it will misbehave when called on a non-empty
DenseMap).
The public API for this is reserve().
CK is using either inline assembly or inline LLVM-IR builtins to
generate buffer_load_dword lds instructions.
This patch exposes this instruction as a Clang builtin available on gfx9 and gfx10.
Related to SWDEV-519702 and SWDEV-518861
Some new registers are reused when replacing some old ones in
certain use case of ModuloScheduleExpander. It is necessary to
avoid repeated interval calculations for these registers.
In order to facilitate targets that only support masked loads/stores
on certain address spaces (AMDGPU will support them in an upcoming
patch, but only for address space 7), add an AddressSpace parameter
to isLegalMaskedLoad and isLegalMaskedStore
#118806 fixed an infinite loop in FoldShiftByConstant that could occur
when the shift amount was a ConstantExpr.
However this meant that FoldShiftByConstant no longer kicked in for
scalable vectors because scalable splats are represented by
ConstantExprs.
This fixes it by allowing scalable splats of non-ConstantExprs in
m_ImmConstant, which also fixes a few other test cases where scalable
splats were being missed.
But I'm also hoping that UseConstantIntForScalableSplat will eventually
remove the need for this.
I noticed this when trying to reverse a combine on RISC-V in #132245,
and saw that the resulting vector and scalar forms were different.
---------
Co-authored-by: Yingwei Zheng <dtcxzyw@qq.com>
Currently iterators over EquivalenceClasses will iterate over std::set,
which guarantees the order specified by the comperator. Unfortunately in
many cases, EquivalenceClasses are used with pointers, so iterating over
std::set of pointers will not be deterministic across runs.
There are multiple places that explicitly try to sort the equivalence
classes before using them to try to get a deterministic order
(LowerTypeTests, SplitModule), but there are others that do not at the
moment and this can result at least in non-determinstic value naming in
Float2Int.
This patch updates EquivalenceClasses to keep track of all members via a
extra SmallVector and removes code from LowerTypeTests and SplitModule
to sort the classes before processing.
Overall it looks like compile-time slightly decreases in most cases, but
close to noise:
https://llvm-compile-time-tracker.com/compare.php?from=7d441d9892295a6eb8aaf481e1715f039f6f224f&to=b0c2ac67a88d3ef86987e2f82115ea0170675a17&stat=instructions
PR: https://github.com/llvm/llvm-project/pull/134075
This pr relands https://github.com/llvm/llvm-project/pull/133302.
It resolves two issues:
- Linking error during build,
[here](https://github.com/llvm/llvm-project/pull/133302#issuecomment-2767259848).
There was a missing dependency for `clangLex` for the
`ParseHLSLRootSignatureTest.cpp` unit testing. This library was added to
the dependencies to resolve the error. It wasn't caught previously as
the library was transitively linked in most build environments
- Warning of unused declaration,
[here](https://github.com/llvm/llvm-project/pull/133302#issuecomment-2767091368).
There was a usability line in `LexHLSLRootSignature.h` of the form
`using TokenKind = enum RootSignatureToken::Kind` which causes this
error. The declaration is removed from the header file to be used
locally in the `.cpp` files that use it.
Notably, the original pr would also exposed `clang::hlsl::TokenKind` to
everywhere it was included, which had a name clash with
`tok::TokenKind`. This is another motivation to change to the proposed
resolution.
---------
Co-authored-by: Finn Plummer <finnplummer@microsoft.com>
GCC sometimes produces warnings about `__attribute__((retain))` despite
`__has_attribute(retain)` being 1. See:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99587
The amount of users who benefit from the attribute is probably very
small compared to the amount of `-Werror` enabled builds or the desire
to keep `-Wattributes` enabled in the LLVM build. So for now drop usage
of the `retain` attribute in GCC builds.
The current API doesn't have a way to unset it. The query returns
an optional, but the set doesn't. Alternatively I could switch the
set to also use optional.
Not sure why the "fold-all" option naming didn't match the
variable "FoldPreOutputs", but I've preserved the difference.
More annoyingly, the pass name "normalize" does not match the pass
name IRNormalizer and should probably be fixed one way or the other.
Also the existing test coverage for the flags is lacking. I've added
a test that shows they parse, but we should have tests that they
do something.
Implement all single-multi {BF/F/S/U/SU/US}MOP4{A/S} instructions in
clang and llvm following the acle in
https://github.com/ARM-software/acle/pull/381/files.
This PR depends on https://github.com/llvm/llvm-project/pull/127797
This patch updates the semantics of template arguments in intrinsic
names for clarity and ease of use. Previously, template argument numbers
indicated which character in the prototype string determined the final
type suffix, which was confusing—especially for intrinsics using
multiple prototype modifiers per operand (e.g., intrinsics operating on
arrays of vectors). The number had to reference the correct character in
the prototype (e.g., the ‘u’ in “2.u”), making the system cumbersome and
error-prone.
With this patch, template argument numbers now refer to the operand
number that determines the final type suffix, providing a more intuitive
and consistent approach.
Some ELF targets don't use @ for relocation specifiers.
We should not report `error: invalid variant` when @ is used.
Attempt to make expr@specifier parsing less hacky.
https://reviews.llvm.org/D17938 introduced lowerRelativeReference to
give ConstantExpr sub (A-B) special semantics in ELF: when `A` is an
`unnamed_addr` function, create a PLT-generating relocation. This was
intended for C++ relative vtables, but C++ relative vtable ended up
using DSOLocalEquivalent (lowerDSOLocalEquivalent).
This special treatment of `unnamed_addr` seems unusual.
Let's remove it. Only COFF needs an overload to generate a @IMGREL32
relocation specifier (llvm/test/MC/COFF/cross-section-relative.ll).
Pull Request: https://github.com/llvm/llvm-project/pull/132684
The pattern
```LLVM
%shl = shl i32 %x, 31
%ashr = ashr i32 %shl, 31
```
would be combined to `G_EXT_INREG %x, 1` by GlobalISel. However
InstCombine normalizes this pattern to:
```LLVM
%and = and i32 %x, 1
%neg = sub i32 0, %and
```
This adds a combiner for this variant as well.
This reverts the revert commit 616f447fc84bdc7655117f1b303d895dc3b93e4d.
It includes updates to remaining users in Polly and Clang, to avoid
failures when building those projects.
Reverts llvm/llvm-project#133302
Reverting to inspect build failures that were introduced from use of the
`clang::Preprocessor` in unit testing, as well as, the warning about an
unused declaration. See linked issue for failures.
- defines the Parser class and an initial set of helper methods to
support consuming tokens. functionality is demonstrated through a simple
empty descriptor table test case
- defines an initial in-memory representation of a DescriptorTable
- implements a test harness that will be used to validate the correct
diagnostics are generated. it will construct a dummy pre-processor with
diagnostics consumer to do so
Implements the first part of
https://github.com/llvm/llvm-project/issues/126569
This adds DWARF generation for fixed-point types. This feature is needed
by Ada.
Note that a pre-existing GNU extension is used in one case. This has
been emitted by GCC for years, and is needed because standard DWARF is
otherwise incapable of representing these types.