The 'loop' construct has some pretty strict checks as to what the for
loop associated with it has to look like. The previous implementation
was a little convoluted and missed implementing the 'condition'
checking. Additionally, it did a bad job double-diagnosing with
templates.
This patch rewrites it in a way that does a much better job with the
double-diagnosing, and proeprly implements some checks for the
'condition' of the for loop.
This Change adds support for two SiFive vendor attributes in clang:
- "SiFive-CLIC-preemptible"
- "SiFive-CLIC-stack-swap"
These can be given together, and can be combined with "machine", but
cannot be combined with any other interrupt attribute values.
These are handled primarily in RISCVFrameLowering:
- "SiFive-CLIC-stack-swap" entails swapping `sp` with `sf.mscratchcsw`
at function entry and exit, which holds the trap stack pointer.
- "SiFive-CLIC-preemptible" entails saving `mcause` and `mepc` before
re-enabling interrupts using `mstatus`. To save these, `s0` and `s1`
are first spilled to the stack, and then the values are read into
these registers. If these registers are used in the function, their
values will be spilled a second time onto the stack with the generic
callee-saved-register handling. At the end of the function interrupts
are disabled again before `mepc` and `mcause` are restored.
This Change also adds support for the following two experimental
extensions, which only contain CSRs:
- XSfsclic - for SiFive's CLIC Supervisor-Mode CSRs
- XSfmclic - for SiFive's CLIC Machine-Mode CSRs
The latter is needed for interrupt support.
The CFI information for this implementation is not correct, but I'd
prefer to correct this in a follow-up. While it's unlikely anyone wants
to unwind through a handler, the CFI information is also used by
debuggers so it would be good to get it right.
Co-authored-by: Ana Pazos <apazos@quicinc.com>
The 'wait' clause is a bit complicated, and is laid out awkwardly in the
IR addative functions, so this patch has to do a little bit of work to
do that (mostly the 'devnum' work).
Otherwise, this is very similar to how num_gangs works, with the
additional complexity of the 'empty' wait being represented differently
as well, but this is similar to how 'async' and a few others work as
well.
Unlike C++, C allows the definition of an uninitialized `const` object.
If the object has static or thread storage duration, it is still
zero-initialized, otherwise, the object is left uninitialized. In either
case, the code is not compatible with C++.
This adds a new diagnostic group, `-Wdefault-const-init-unsafe`, which
is on by default and diagnoses any definition of a `const` object which
remains uninitialized.
It also adds another new diagnostic group, `-Wdefault-const-init` (which
also enabled the `unsafe` variant) that diagnoses any definition of a
`const` object (including ones which are zero-initialized). This
diagnostic is off by default.
Finally, it adds `-Wdefault-const-init` to `-Wc++-compat`. GCC diagnoses
these situations under this flag.
Fixes#19297
When records contain fields with pointer authentication, even simple
copies can require
additional work be performed. This patch contains the core functionality
required to
handle user defined structs, as well as the implicitly constructed
structs for blocks, etc.
Co-authored-by: Ahmed Bougacha
Co-authored-by: Akira Hatanaka
Co-authored-by: John Mccall
Resolves#99114.
Tasks completed:
- Implement `faceforward` in
`hlsl_intrinsics.h`/`hlsl_intrinsic_helpers.h`
- Implement `faceforward` SPIR-V target builtin in
`clang/include/clang/Basic/BuiltinsSPIRV.td`
- Add a SPIR-V fast path in `hlsl_intrinsic_helpers.h`
- Add sema checks for `faceforward` to `CheckSPIRVBuiltinFunctionCall`
in `clang/lib/Sema/SemaSPIRV.cpp`
- Add codegen for SPIR-V `faceforward` builtin to `EmitSPIRVBuiltinExpr`
in `SPIR.cpp`
- Add HLSL codegen tests to
`clang/test/CodeGenHLSL/builtins/faceforward.hlsl`
- Add SPIRV builtin codegen tests to
`clang/test/CodeGenSPIRV/Builtins/faceforward.c`
- Add sema tests to
`clang/test/SemaHLSL/BuiltIns/faceforward-errors.hlsl`
- Add spirv sema tests to
`clang/test/SemaSPIRV/BuiltIns/faceforward-errors.c`
- Create the `int_spv_faceforward` intrinsic in `IntrinsicsSPIRV.td`
- In `SPIRVInstructionSelector.cpp` create the `faceforward` lowering
and map it to `int_spv_faceforward` in
`SPIRVInstructionSelector::selectIntrinsic`
- Create SPIR-V backend test case in
`llvm/test/CodeGen/SPIRV/hlsl-intrinsics/faceforward.ll`
Incomplete tasks:
- Create SPIR-V backend test case in
`llvm/test/CodeGen/SPIRV/opencl/faceforward.ll`
- Not applicable because the OpenCL SPIR-V extended instruction set does
not include a `faceforward` function
Follow-up tasks:
- Implement pattern matching for `faceforward` in `SPIRVCombine.td` and
`SPIRVPreLegalizerCombiner.cpp`
- In `faceforward.ll`, change `--target-env spv1.4` to `vulkan1.3` and
update the test accordingly once
[#136344](https://github.com/llvm/llvm-project/issues/136344) has been
resolved
This introduces a new diagnostic group (-Wimplicit-void-ptr-cast),
grouped under -Wc++-compat, which diagnoses implicit conversions from
void * to another pointer type in C. It's a common source of
incompatibility with C++ and is something GCC diagnoses (though GCC does
not have a specific warning group for this).
Fixes#17792
Static analysis flagged this code b/c we should have been using
std::move when passing by value since the value is not used anymore. In
this case the simpler fix is just to use a temporary value as many of
the other cases where we simply use MakeIntValue to then create an
APValue result from it.
In a lambda function, a call of a function may
resolve to host and device functions with different
signatures. Especially, a constexpr local variable may
be passed by value by the device function and
passed by reference by the host function, which
will cause the constexpr variable captured by
the lambda function in host compilation but
not in the device compilation. The discrepancy
in the lambda captures will violate ODR and
causes UB for kernels using these lambdas.
This PR fixes the issue by identifying
discrepancy of ODR/non-ODR usages of constexpr
local variables passed to host/device functions
and conservatively capture them.
Fixes: https://github.com/llvm/llvm-project/issues/132068
Previously, analysis-based diagnostics (like -Wconsumed) had to be
enabled at file scope in order to be run at the end of each function
body. This meant that they did not respect #pragma clang diagnostic
enabling or disabling the diagnostic.
Now, these pragmas can control the diagnostic emission.
Fixes#42199
FPSCR and FPEXC will be stored in FPStatusRegs, after GPRCS2 has been
saved.
- GPRCS1
- GPRCS2
- FPStatusRegs (new)
- DPRCS
- GPRCS3
- DPRCS2
FPSCR is present on all targets with a VFP, but the FPEXC register is
not present on Cortex-M devices, so different amounts of bytes are
being pushed onto the stack depending on our target, which would
affect alignment for subsequent saves.
DPRCS1 will sum up all previous bytes that were saved, and will emit
extra instructions to ensure that its alignment is correct. My
assumption is that if DPRCS1 is able to correct its alignment to be
correct, then all subsequent saves will also have correct alignment.
Avoid annotating the saving of FPSCR and FPEXC for functions marked
with the interrupt_save_fp attribute, even though this is done as part
of frame setup. Since these are status registers, there really is no
viable way of annotating this. Since these aren't GPRs or DPRs, they
can't be used with .save or .vsave directives. Instead, just record
that the intermediate registers r4 and r5 are saved to the stack
again.
Co-authored-by: Jake Vossen <jake@vossen.dev>
Co-authored-by: Alan Phipps <a-phipps@ti.com>
Async acts just like num_workers/vector_length in that it gets a new
variant per device_type and is lowered as an operand.
However, it has one additional complication, in that it can have a
variant that has no argument, which produces an attribute with the
correct devicetype.
Additionally, this syncronizes us with the implementation of flang,
which prohibits multiple 'async' clauses per-device_type.
Instead of defining ARM ACLE intrinsics only on MSVC and guarding
wrapper functions in headers with `__has_builtin`, universally define
the intrinsics as target header builtins.
---------
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Reland #134174
Update Sema Checking to always do an HLSL Array RValue cast in the case
we are dealing with hlsl constant array types
Instead of comparing canonical types, compare canonical unqualified
types
Add a test to show it is possible to assign an array from a cbuffer.
Closes https://github.com/llvm/llvm-project/issues/133767
Update Sema Checking to always do an HLSL Array RValue cast in the case
we are dealing with hlsl constant array types
Instead of comparing canonical types, compare canonical unqualified
types
Add a test to show it is possible to assign an array from a cbuffer.
Closes#133767
Very simply extends the bitfield sema checks for assignment to fields
with a preferred type specified to consider the preferred type if the
decl storage type is not explicitly an enum type.
This does mean that if the preferred and explicit types have different
storage requirements we may not warn in all possible cases, but that's a
scenario for which the warnings are much more complex and confusing.
Previously using the `counted_by` or `counted_by_or_null` attribute on a
pointer with an incomplete pointee type was forbidden. Unfortunately
this prevented a situation like the following from being allowed.
Header file:
```
struct EltTy; // Incomplete type
struct Buffer {
size_t count;
struct EltTy* __counted_by(count) buffer; // error before this patch
};
```
Implementation file:
```
struct EltTy {
// definition
};
void allocBuffer(struct Buffer* b) {
b->buffer = malloc(sizeof(EltTy)* b->count);
}
```
To allow code like the above but still enforce that the pointee
type size is known in locations where `-fbounds-safety` needs to
emit bounds checks the following scheme is used.
* For incomplete pointee types that can never be completed (e.g. `void`)
these are treated as error where the attribute is written (just like
before this patch).
* For incomplete pointee types that might be completable later on
(struct, union, and enum forward declarations)
in the translation unit, writing the attribute on the incomplete
pointee type is allowed on the FieldDecl declaration but "uses" of the
declared pointer are forbidden if at the point of "use" the pointee
type is still incomplete.
For this patch a "use" of a FieldDecl covers:
* Explicit and Implicit initialization (note see **Tentative Definition
Initialization** for an exception to this)
* Assignment
* Conversion to an lvalue (e.g. for use in an expression)
In the swift lang fork of Clang the `counted_by` and
`counted_by_or_null` attribute are allowed in many more contexts. That
isn't the case for upstream Clang so the "use" checks for the attribute
on VarDecl, ParamVarDecl, and function return type have been omitted
from this patch because they can't be tested. However, the
`BoundsSafetyCheckAssignmentToCountAttrPtrWithIncompletePointeeTy` and
`BoundsSafetyCheckUseOfCountAttrPtrWithIncompletePointeeTy` functions
retain the ability to emit diagnostics for these other contexts to avoid
unnecessary divergence between upstream Clang and Apple's internal fork.
Support for checking "uses" will be upstreamed when upstream Clang
allows the `counted_by` and `counted_by_or_null` attribute in additional
contexts.
This patch has a few limitations:
** 1. Tentative Defition Initialization **
This patch currently allows something like:
```
struct IncompleteTy;
struct Buffer {
int count;
struct IncompleteTy* __counted_by(count) buf;
};
// Tentative definition
struct Buffer GlobalBuf;
```
The Tentative definition in this example becomes an actual definition
whose initialization **should be checked** but it currently isn't.
Addressing this problem will be done in a subseqent patch.
** 2. When the incomplete pointee type is a typedef diagnostics are slightly misleading **
For this situation:
```
struct IncompleteTy;
typedef struct IncompleteTy Incomplete_t;
struct Buffer {
int count;
struct IncompleteTy* __counted_by(count) buf;
};
void use(struct Buffer b) {
b.buf = 0x0;
}
```
This code emits `note: forward declaration of 'Incomplete_t' (aka
'struct IncompleteTy')` but the location is on the `struct
IncompleteTy;` forward declaration. This is misleading because
`Incomplete_t` isn't actually forward declared there (instead the
underlying type is). This could be resolved by additional diagnostics
that walk the chain of typedefs and explain each step of the walk.
However, that would be very verbose and didn't seem like a direction
worth pursuing.
rdar://133600117
…uses
The Flang implemenation of OpenACC uses a .td file in the llvm/Frontend
directory to determine appertainment in 4 categories:
-Required: If this list has items in it, the directive requires at least
1 of these be present.
-AllowedExclusive: Items on this list are all allowed, but only 1 from
the list may be here (That is, they are exclusive of eachother).
-AllowedOnce: Items on this list are all allowed, but may not be
duplicated.
Allowed: Items on this list are allowed. Note th at the actual list of
'allowed' is all 4 of these lists together.
This is a draft patch to swtich Clang over to use those tables. Surgery
to get this to happen in Clang Sema was somewhat reasonable. However,
some gaps in the implementations are obvious, the existing clang
implementation disagrees with the Flang interpretation of it. SO, we're
keeping a task list here based on what gets discovered.
Changes to Clang:
- [x] Switch 'directive-kind' enum conversions to use tablegen See
ff1a7bddd9435b6ae2890c07eae60bb07898bbf5
- [x] Switch 'clause-kind' enum conversions to use tablegen See
ff1a7bddd9435b6ae2890c07eae60bb07898bbf5
- [x] Investigate 'parse' test differences to see if any new
disagreements arise.
- [x] Clang/Flang disagree as to whether 'collapse' can be multiple
times on a loop. Further research showed no prose to limit this, and the
comment on the clang implementation said "no good reason to allow", so
no standards justification.
- [x] Clang/Flang disagree whether 'num_gangs' can appear >1 on a
compute/combined construct. This ended up being an unjustified
restriction.
- [x] Clang/Flang disagree as to the list of required clauses on a 'set'
construct. My research shows that Clang mistakenly included 'if' in the
list, and that it should be just 'default_async', 'device_num', and
'device_type'.
- [x] Order of 'at least one of' diagnostic has changed. Tests were
updated.
- [x] Ensure we are properly 'de-aliasing' clause names in appertainment
checks?
- [x] What is 'shortloop'? 'shortloop' seems to be an old non-standard
extension that isn't supported by flang, but is parsed for backward
compat reasons. Clang won't parse, but we at least have a spot for it in
the clause list.
- [x] Implemented proposed change for 'routine' gang/worker/vector/seq.
(see issue 539)
- [x] Implement init/shutdown can only have 1 'if' (see issue 540)
- [x] Clang/Flang disagree as to whether 'tile' is permitted more than
once on a 'loop' or combined constructs (Flang prohibits >1). I see no
justification for this in the standard. EDIT: I found a comment in clang
that I did this to make SOMETHING around duplicate checks easier.
Discussion showed we should actually have a better behavior around
'device_type' and duplicates, so I've since implemented that.
- [x] Clang/Flang disagree whether 'gang', 'worker', or 'vector' may
appear on the same construct as a 'seq' on a 'loop' or 'combined'. There
is prose for this in 2022: (a gang, worker, or vector clause may not
appear if a 'seq' clause appears). EDIT: These don't actually disagree,
but aren't in the .td file, so I restored the existing code to do this.
- [x] Clang/Flang disagree on whether 'bind' can appear >1 on a
'routine'. I believe line 3096 (A bind clause may not bind to a routine
name that has a visible bind clause) makes this limitation (Flang
permits >1 bind). we discussed and decided this should have the same
rules as worker/vector/etc, except without the 'exactly 1 of' rule (so
no dupes in individual sections).
- [x] Clang/Flang disagree on whether 'init'/'shutdown' can have
multiple 'device_num' clauses. I believe there is no supporting prose
for this limitation., We decided that `device_num` should only happen
1x.
- [x] Clang/Flang disagree whether 'num_gangs' can appear >1 on a
'kernels' construct. Line 1173 (On a kernels construct, the num_gangs
clause must have a single argument) justifies limiting on a
per-arguement basis, but doesn't do so for multiple num_gangs clauses.
WE decided to do this with the '1-per-device-type' region for num_gangs,
num_workers, and vector_length, see openacc bug here:
https://github.com/OpenACC/openacc-spec/issues/541
Changes to Flang:
- [x] Clang/Flang disgree on whether 'atomic' can take an 'if' clause.
This was added in OpenACC3.3_Next See #135451
- [x] Clang/Flang disagree on whether 'finalize' can be allowed >1 times
on a 'exit_data' construct. see #135415.
- [x] Clang/Flang disagree whether 'if_present' should be allowed >1
times on a 'host_data'/'update' construct. see #135422
- [x] Clang/Flang disagree on whether 'init'/'shutdown' can have
multiple 'device_type' clauses. I believe there is no supporting prose
for this limitation.
- [ ] SEE change for num_gangs/etc above.
Changes that need discussion/research:
OpenMP has restrictions on directives allowed to be strictly nested
inside a
construct with the order(concurrent) clause specified.
- OpenMP 5.0, 5.1, and 5.2 allows: 'loop', 'parallel', 'simd', and
combined directives starting with 'parallel'.
- OpenMP 6.0 allows: the above directives plus 'atomic' and
all loop-transformation directives.
Furthermore, a region that corresponds to a construct with
order(concurrent)
specified may not contain calls to the OpenMP runtime API.
This PR fixes the following issues in the current implementation:
With -fopenmp-version=50: none of the nesting restrictions above were
enforced
With -fopenmp-version=60:
1. Clang did not reject OpenMP runtime APIs encountered in the region.
2. Clang erroneously rejected combined directives starting with
parallel.
---------
Co-authored-by: Zahira Ammarguellat <zahira.ammarguellat@intel.com>
The MSVC STL includes specializations of `_Is_memfunptr` for every
function pointer type, including every calling convention.
The problem is the AMDGPU target doesn't support the x86 `vectorcall`
calling convention so clang sets it to the default CC. This ends up
clashing with the already-existing overload for the default CC, so we
get a duplicate definition error when including `type_traits` (which we
heavily use in the SYCL STL) and compiling for AMDGPU on Windows.
This doesn't happen for pure AMDGPU non-SYCL because it doesn't include
the C++ STL, and it doesn't happen for CUDA/HIP because a similar
workaround was done
[here](fa49c3a888).
I am not an expert in Sema, so I did a kinda of hardcoded fix, please
let me know if there is a better way to fix this.
As far as I can tell we can't do exactly the same fix that was done for
CUDA because we can't differentiate between device and host code so
easily.
---------
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
This patch replaces:
llvm::copy(Src, std::back_inserter(Dst));
with:
llvm::append_range(Dst, Src);
for breavity.
One side benefit is that llvm::append_range eventually calls
llvm::SmallVector::reserve if Dst is of llvm::SmallVector.
This PR reland https://github.com/llvm/llvm-project/pull/135808, fixed
some missed changes in LLDB.
I found this issue when I working on
https://github.com/llvm/llvm-project/pull/107168.
Currently we have many similiar data structures like:
- std::pair<IdentifierInfo *, SourceLocation>.
- Element type of ModuleIdPath.
- IdentifierLocPair.
- IdentifierLoc.
This PR unify these data structures to IdentifierLoc, moved
IdentifierLoc definition to SourceLocation.h, and deleted other similer
data structures.
---------
Signed-off-by: yronglin <yronglin777@gmail.com>