This patch handle struct of fixed vector and struct of array of fixed
vector correctly for VLS calling convention in EmitFunctionProlog,
EmitFunctionEpilog and EmitCall.
stack on: https://github.com/llvm/llvm-project/pull/147173
This is a major change on how we represent nested name qualifications in
the AST.
* The nested name specifier itself and how it's stored is changed. The
prefixes for types are handled within the type hierarchy, which makes
canonicalization for them super cheap, no memory allocation required.
Also translating a type into nested name specifier form becomes a no-op.
An identifier is stored as a DependentNameType. The nested name
specifier gains a lightweight handle class, to be used instead of
passing around pointers, which is similar to what is implemented for
TemplateName. There is still one free bit available, and this handle can
be used within a PointerUnion and PointerIntPair, which should keep
bit-packing aficionados happy.
* The ElaboratedType node is removed, all type nodes in which it could
previously apply to can now store the elaborated keyword and name
qualifier, tail allocating when present.
* TagTypes can now point to the exact declaration found when producing
these, as opposed to the previous situation of there only existing one
TagType per entity. This increases the amount of type sugar retained,
and can have several applications, for example in tracking module
ownership, and other tools which care about source file origins, such as
IWYU. These TagTypes are lazily allocated, in order to limit the
increase in AST size.
This patch offers a great performance benefit.
It greatly improves compilation time for
[stdexec](https://github.com/NVIDIA/stdexec). For one datapoint, for
`test_on2.cpp` in that project, which is the slowest compiling test,
this patch improves `-c` compilation time by about 7.2%, with the
`-fsyntax-only` improvement being at ~12%.
This has great results on compile-time-tracker as well:

This patch also further enables other optimziations in the future, and
will reduce the performance impact of template specialization resugaring
when that lands.
It has some other miscelaneous drive-by fixes.
About the review: Yes the patch is huge, sorry about that. Part of the
reason is that I started by the nested name specifier part, before the
ElaboratedType part, but that had a huge performance downside, as
ElaboratedType is a big performance hog. I didn't have the steam to go
back and change the patch after the fact.
There is also a lot of internal API changes, and it made sense to remove
ElaboratedType in one go, versus removing it from one type at a time, as
that would present much more churn to the users. Also, the nested name
specifier having a different API avoids missing changes related to how
prefixes work now, which could make existing code compile but not work.
How to review: The important changes are all in
`clang/include/clang/AST` and `clang/lib/AST`, with also important
changes in `clang/lib/Sema/TreeTransform.h`.
The rest and bulk of the changes are mostly consequences of the changes
in API.
PS: TagType::getDecl is renamed to `getOriginalDecl` in this patch, just
for easier to rebasing. I plan to rename it back after this lands.
Fixes#136624
Fixes https://github.com/llvm/llvm-project/issues/43179
Fixes https://github.com/llvm/llvm-project/issues/68670
Fixes https://github.com/llvm/llvm-project/issues/92757
This is similar to -msve-vector-bits, but for streaming mode: it
constrains the legal values of "vscale", allowing optimizations based on
that constraint.
This also fixes conversions between SVE vectors and fixed-width vectors
in streaming functions with -msve-vector-bits and
-msve-streaming-vector-bits.
This rejects any use of arm_sve_vector_bits types in streaming
functions; if it becomes relevant, we could add
arm_sve_streaming_vector_bits types in the future.
This doesn't touch the __ARM_FEATURE_SVE_BITS define.
This Change adds support for two SiFive vendor attributes in clang:
- "SiFive-CLIC-preemptible"
- "SiFive-CLIC-stack-swap"
These can be given together, and can be combined with "machine", but
cannot be combined with any other interrupt attribute values.
These are handled primarily in RISCVFrameLowering:
- "SiFive-CLIC-stack-swap" entails swapping `sp` with `sf.mscratchcsw`
at function entry and exit, which holds the trap stack pointer.
- "SiFive-CLIC-preemptible" entails saving `mcause` and `mepc` before
re-enabling interrupts using `mstatus`. To save these, `s0` and `s1`
are first spilled to the stack, and then the values are read into
these registers. If these registers are used in the function, their
values will be spilled a second time onto the stack with the generic
callee-saved-register handling. At the end of the function interrupts
are disabled again before `mepc` and `mcause` are restored.
This Change also adds support for the following two experimental
extensions, which only contain CSRs:
- XSfsclic - for SiFive's CLIC Supervisor-Mode CSRs
- XSfmclic - for SiFive's CLIC Machine-Mode CSRs
The latter is needed for interrupt support.
The CFI information for this implementation is not correct, but I'd
prefer to correct this in a follow-up. While it's unlikely anyone wants
to unwind through a handler, the CFI information is also used by
debuggers so it would be good to get it right.
Co-authored-by: Ana Pazos <apazos@quicinc.com>
RISCVABIInfo::detectVLSCCEligibleStruct should early exit if VLS calling
convention is not used, however the sentinel value was not set to
correctly, it should be zero instead of one.
This change adds support for `qci-nest` and `qci-nonest` interrupt
attribute values. Both of these are machine-mode interrupts, which use
instructions in Xqciint to push and pop A- and T-registers (and a few
others) from the stack.
In particular:
- `qci-nonest` uses `qc.c.mienter` to save registers at the start of the
function, and uses `qc.c.mileaveret` to restore those registers and
return from the interrupt.
- `qci-nest` uses `qc.c.mienter.nest` to save registers at the start of
the function, and uses `qc.c.mileaveret` to restore those registers and
return from the interrupt.
- `qc.c.mienter` and `qc.c.mienter.nest` both push registers ra, s0
(fp), t0-t6, and a0-a10 onto the stack (as well as some CSRs for the
interrupt context). The difference between these is that
`qc.c.mienter.nest` re-enables M-mode interrupts.
- `qc.c.mileaveret` will restore the registers that were saved by
`qc.c.mienter(.nest)`, and return from the interrupt.
These work for both standard M-mode interrupts and the non-maskable
interrupt CSRs added by Xqciint.
The `qc.c.mienter`, `qc.c.mienter.nest` and `qc.c.mileaveret`
instructions are compatible with push and pop instructions, in as much
as they (mostly) only spill the A- and T-registers, so we can use the
`Zcmp` or `Xqccmp` instructions to spill the S-registers. This
combination (`qci-(no)nest` and `Xqccmp`/`Zcmp`) is not implemented in
this change.
The `qc.c.mienter(.nest)` instructions have a specific register storage
order so they preserve the frame pointer convention linked list past the
current interrupt handler and into the interrupted code and frames if
frame pointers are enabled.
Co-authored-by: Pankaj Gode <quic_pgode@quicinc.com>
This patch adds a function attribute `riscv_vls_cc` for RISCV VLS
calling
convention which takes 0 or 1 argument, the argument is the `ABI_VLEN`
which is the `VLEN` for passing the fixed-vector arguments, it wraps the
argument as a scalable vector(VLA) using the `ABI_VLEN` and uses the
corresponding mechanism to handle it. The range of `ABI_VLEN` is [32,
65536],
if not specified, the default value is 128.
Here is an example of VLS argument passing:
Non-VLS call:
```
void original_call(__attribute__((vector_size(16))) int arg) {}
=>
define void @original_call(i128 noundef %arg) {
entry:
...
ret void
}
```
VLS call:
```
void __attribute__((riscv_vls_cc(256))) vls_call(__attribute__((vector_size(16))) int arg) {}
=>
define riscv_vls_cc void @vls_call(<vscale x 1 x i32> %arg) {
entry:
...
ret void
}
}
```
The first Non-VLS call passes generic vector argument of 16 bytes by
flattened integer.
On the contrary, the VLS call uses `ABI_VLEN=256` which wraps the
vector to <vscale x 1 x i32> where the number of scalable vector
elements
is calaulated by: `ORIG_ELTS * RVV_BITS_PER_BLOCK / ABI_VLEN`.
Note: ORIG_ELTS = Vector Size / Type Size = 128 / 32 = 4.
PsABI PR: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/418
C-API PR: https://github.com/riscv-non-isa/riscv-c-api-doc/pull/68
`sret` arguments are always going to reside in the stack/`alloca`
address space, which makes the current formulation where their AS is
derived from the pointee somewhat quaint. This patch ensures that `sret`
ends up pointing to the `alloca` AS in IR function signatures, and also
guards agains trying to pass a casted `alloca`d pointer to a `sret` arg,
which can happen for most languages, when compiled for targets that have
a non-zero `alloca` AS (e.g. AMDGCN) / map `LangAS::default` to a
non-zero value (SPIR-V). A target could still choose to do something
different here, by e.g. overriding `classifyReturnType` behaviour.
In a broader sense, this patch extends non-aliased indirect args to also
carry an AS, which leads to changing the `getIndirect()` interface. At
the moment we're only using this for (indirect) returns, but it allows
for future handling of indirect args themselves. We default to using the
AllocaAS as that matches what Clang is currently doing, however if, in
the future, a target would opt for e.g. placing indirect returns in some
other storage, with another AS, this will require revisiting.
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
If we have +sme but not +sve, we would not set vscale_range on
functions. It should be valid to apply it with the same range with just
+sme, which can help mitigate some performance regressions in cases such
as scalable vector bitcasts (https://godbolt.org/z/exhe4jd8d).
This reverts commit 81fc3add1e627c23b7270fe2739cdacc09063e54.
This breaks some LLDB tests, e.g.
SymbolFile/DWARF/x86/no_unique_address-with-bitfields.cpp:
lldb: ../llvm-project/clang/lib/AST/Decl.cpp:4604: unsigned int clang::FieldDecl::getBitWidthValue() const: Assertion `isa<ConstantExpr>(getBitWidth())' failed.
Save the bitwidth value as a `ConstantExpr` with the value set. Remove
the `ASTContext` parameter from `getBitWidthValue()`, so the latter
simply returns the value from the `ConstantExpr` instead of
constant-evaluating the bitwidth expression every time it is called.
Enables the support of `-fcf-protection=return` on RISC-V, which
requires Zicfiss. It also adds a string attribute "hw-shadow-stack"
to every function if the option is set on RISC-V
This patch enable the function multiversion(FMV) and `target_clones`
attribute for RISC-V target.
The proposal of `target_clones` syntax can be found at the
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/48 (which has
landed), as modified by the proposed
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/85 (which adds the
priority syntax).
It supports the `target_clones` function attribute and function
multiversioning feature for RISC-V target. It will generate the ifunc
resolver function for the function that declared with target_clones
attribute.
The resolver function will check the version support by runtime object
`__riscv_feature_bits`.
For example:
```
__attribute__((target_clones("default", "arch=+ver1", "arch=+ver2"))) int bar() {
return 1;
}
```
the corresponding resolver will be like:
```
bar.resolver() {
__init_riscv_feature_bits();
// Check arch=+ver1
if ((__riscv_feature_bits.features[0] & BITMASK_OF_VERSION1) == BITMASK_OF_VERSION1) {
return bar.arch=+ver1;
} else {
// Check arch=+ver2
if ((__riscv_feature_bits.features[0] & BITMASK_OF_VERSION2) == BITMASK_OF_VERSION2) {
return bar.arch=+ver2;
} else {
// Default
return bar.default;
}
}
}
```
Update codegen for func param with transparent_union attr to be that of
the first union member.
This is a followup to #101738 to fix non-ppc codegen and closes#76773.
According to RISC-V integer calling convention empty structs or union
arguments or return values are ignored by C compilers which support them
as a non-standard extension. This is not the case for C++, which
requires them to be sized types.
Fixes#97285
In PR #79382, I need to add a new type that derives from
ConstantArrayType. This means that ConstantArrayType can no longer use
`llvm::TrailingObjects` to store the trailing optional Expr*.
This change refactors ConstantArrayType to store a 60-bit integer and
4-bits for the integer size in bytes. This replaces the APInt field
previously in the type but preserves enough information to recreate it
where needed.
To reduce the number of places where the APInt is re-constructed I've
also added some helper methods to the ConstantArrayType to allow some
common use cases that operate on either the stored small integer or the
APInt as appropriate.
Resolves#85124.
For Embedded Swift, let's unblock building for RISC-V boards (e.g.
ESP32-C6). This isn't trying to add full RISC-V support to Swift /
Embedded Swift, it's just fixing the immediate blocker (not having
SwiftInfo defined blocks all compilations).
Test updated to expect i8 gep.
Original message:
This adopts a similar behavior to AArch64 SVE, where bool vectors are
represented as a vector of chars with 1/8 the number of elements. This
ensures the vector always occupies a power of 2 number of bytes.
A consequence of this is that vbool64_t, vbool32_t, and vool16_t can
only be used with a vector length that guarantees at least 8 bits.
This adopts a similar behavior to AArch64 SVE, where bool vectors are
represented as a vector of chars with 1/8 the number of elements. This
ensures the vector always occupies a power of 2 number of bytes.
A consequence of this is that vbool64_t, vbool32_t, and vool16_t can
only be used with a vector length that guarantees at least 8 bits.
This commit includes the necessary changes to clang and LLVM to support
codegen of `RVE` and the `ilp32e`/`lp64e` ABIs.
The differences between `RVE` and `RVI` are:
* `RVE` reduces the integer register count to 16(x0-x16).
* The ABI should be `ilp32e` for 32 bits and `lp64e` for 64 bits.
`RVE` can be combined with all current standard extensions.
The central changes in ilp32e/lp64e ABI, compared to ilp32/lp64 are:
* Only 6 integer argument registers (rather than 8).
* Only 2 callee-saved registers (rather than 12).
* A Stack Alignment of 32bits (rather than 128bits).
* ilp32e isn't compatible with D ISA extension.
If `ilp32e` or `lp64` is used with an ISA that has any of the registers
x16-x31 and f0-f31, then these registers are considered temporaries.
To be compatible with the implementation of ilp32e in GCC, we don't use
aligned registers to pass variadic arguments and set stack alignment\
to 4-bytes for types with length of 2*XLEN.
FastCC is also supported on RVE, while GHC isn't since there is only one
avaiable register.
Differential Revision: https://reviews.llvm.org/D70401
As reported in <https://github.com/llvm/llvm-project/issues/58929>,
Clang's handling of empty structs in the case of small structs that may
be eligible to be passed using the hard FP calling convention doesn't
match g++. In general, C++ record fields are never empty unless
[[no_unique_address]] is used, but the RISC-V FP ABI overrides this.
After this patch, fields of structs that contain empty records will be
ignored, even in C++, when considering eligibility for the FP calling
convention ('flattening'). It isn't explicitly noted in the RISC-V
psABI, but arrays of empty records will disqualify a struct for
consideration of using the FP calling convention in g++. This patch
matches that behaviour. The psABI issue
<https://github.com/riscv-non-isa/riscv-elf-psabi-doc/issues/358> seeks
to clarify this.
This patch was previously committed but reverted after a bug was found.
This recommit adds additional logic to prevent that bug (adding an extra
check for when a candidate from detectFPCCEligibleStructHelper may not
be valid).
Differential Revision: https://reviews.llvm.org/D142327
We used to convert them to M1 types in arguments and return
value, which causes failures in CodeGen since it is not legal
to insert subvectors with LMUL>1 to M1 vectors.
Fixes 64266
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D156779
This reverts commit 17a58b3ca7ec18585e9ea8ed8b39d72fe36fb6cb and the
minor documentation fix 569e99a471f618b7fdf045d5e96f21d3e3a7f898.
An issue was reported in https://reviews.llvm.org/D142327#inline-1510301
so reverting until it can be investigated and fixed.
`CGBuilderTy::CreateElementBitCast()` no longer does what its name suggests.
Remove remaining in-tree uses by one of the following methods.
* drop the call entirely
* fold it to an `Address` construction
* replace it with `Address::withElementType()`
This is a NFC cleanup effort.
Reviewed By: barannikov88, nikic, jrtc27
Differential Revision: https://reviews.llvm.org/D154285
This commit breaks up CodeGen/TargetInfo.cpp into a set of *.cpp files,
one file per target. There are no functional changes, mostly just code moving.
Non-code-moving changes are:
* A virtual destructor has been added to DefaultABIInfo to pin the vtable to a cpp file.
* A few methods of ABIInfo and DefaultABIInfo were split into declaration + definition
in order to reduce the number of transitive includes.
* Several functions that used to be static have been placed in clang::CodeGen
namespace so that they can be accessed from other cpp files.
RFC: https://discourse.llvm.org/t/rfc-splitting-clangs-targetinfo-cpp/69883
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D148094