ponter int *p for following map, test currently crash.
map(p, p[:100]) or map(p, p[1])
Currly IR looks like
// &p, &p, sizeof(int), TARGET_PARAM | TO | FROM
// &p, p[0], 100sizeof(float) TO | FROM
Worrking IR is
// map(p, p[0:100]) to map(p[0:100])
// &p, &p[0], 100*sizeof(float), TARGET_PARAM | TO | FROM | PTR_AND_OBJ
The change is add new argument AreBothBasePtrAndPteeMapped in
generateInfoForComponentList
Use that to skip map for map(p), when processing map(p[:100]) generate
map with right flag.
Make sure that empty structs are treated as if it has a size of one bit
in function parameters and return types so that it occupies a full
argument and/or return register slot.
This fixes crashes and miscompilations when passing and/or returning
empty structs.
Reviewed by: @s-barannikov
The C++ standard requires that symmetric transfer from one coroutine to
another is performed via a tail call. Failure to do so is a miscompile
and often breaks programs by quickly overflowing the stack.
Until now, the coro split pass tried to ensure this in the
`addMustTailToCoroResumes()` function by searching for
`llvm.coro.resume` calls to lower as tail calls if the conditions were
right: the right function arguments, attributes, calling convention
etc., and if a `ret void` was sure to be reached after traversal with
some ad-hoc constant folding following the call.
This was brittle, as the kind of implicit variants required for a tail
call to happen could easily be broken by other passes (e.g. if some
instruction got in between the `resume` and `ret`), see for example
9d1cb18d19862fc0627e4a56e1e491a498e84c71 and
284da049f5feb62b40f5abc41dda7895e3d81d72.
Also the logic seemed backwards: instead of searching for possible tail
call candidates and doing them if the circumstances are right, it seems
better to start with the intention of making the tail calls we need, and
forcing the circumstances to be right.
Now that we have the `llvm.coro.await.suspend.handle` intrinsic (since
f78688134026686288a8d310b493d9327753a022) which corresponds exactly to
symmetric transfer, change the lowering of that to also include the
`resume` part, always lowered as a tail call.
PR #80680 added bits in the codegen to lazily add convergence intrinsics
when required. This logic relied on the LoopStack. The issue is when
parsing the condition, the loopstack doesn't yet reflect the correct
values, as expected since we are not yet in the loop.
However, convergence tokens should sometimes already be available. The
solution which seemed the simplest is to greedily generate the tokens
when we generate SPIR-V.
Fixes#88144
---------
Signed-off-by: Nathan Gauër <brioche@google.com>
PR #87090 amended `accumulateBitfields` to do the correct clipping. The scissor is no longer necessary and `checkBitfieldClipping` can compute its location directly when needed.
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.
- StringRef::operator==/!= outnumber StringRef::equals by a factor of
24 under clang/ in terms of their usage.
- The elimination of StringRef::equals brings StringRef closer to
std::string_view, which has operator== but not equals.
- S == "foo" is more readable than S.equals("foo"), especially for
!Long.Expression.equals("str") vs Long.Expression != "str".
Adds a builtin and intrinsic for the f32.store_f16 instruction.
The instruction stores an f32 value as an f16 memory. Specified at:
29a9b9462c/proposals/half-precision/Overview.md
Note: the current spec has f32.store_f16 as opcode 0xFD0121, but this is
incorrect and will be changed to 0xFC31 soon.
Depends on #87545
Emit PAuth ABI compatibility tag values as llvm module flags:
- `aarch64-elf-pauthabi-platform`
- `aarch64-elf-pauthabi-version`
For platform 0x10000002 (llvm_linux), the version value bits correspond
to the following LangOptions defined in #85232:
- bit 0: `PointerAuthIntrinsics`;
- bit 1: `PointerAuthCalls`;
- bit 2: `PointerAuthReturns`;
- bit 3: `PointerAuthAuthTraps`;
- bit 4: `PointerAuthVTPtrAddressDiscrimination`;
- bit 5: `PointerAuthVTPtrTypeDiscrimination`;
- bit 6: `PointerAuthInitFini`.
---------
Co-authored-by: Ahmed Bougacha <ahmed@bougacha.org>
After 11a6799740f8 "[clang][CodeGen] Omit pre-opt link when post-opt is
link requested (#85672)" I'm seeing a new warning:
> BackendConsumer.h:37:22: error: private field 'FileMgr' is not used
[-Werror,-Wunused-private-field]
Remove the field since it's no longer used.
Currently, when the -relink-builtin-bitcodes-postop option is used we
link builtin bitcodes twice: once before optimization, and again after
optimization.
With this change, we omit the pre-opt linking when the option is set,
and we rename the option to the following:
-Xclang -mlink-builtin-bitcodes-postopt
(-Xclang -mno-link-builtin-bitcodes-postopt)
The goal of this change is to reduce compile time. We do lose the
theoretical benefits of pre-opt linking, but in practice these are small
than the overhead of linking twice. However we may be able to address
this in a future patch by adjusting the position of the builtin-bitcode
linking pass.
Compilations not setting the option are unaffected
This change is an implementation of #87367's investigation on supporting
IEEE math operations as intrinsics.
Which was discussed in this RFC:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294
If you want an overarching view of how this will all connect see:
https://github.com/llvm/llvm-project/pull/90088
Changes:
- `clang/docs/LanguageExtensions.rst` - Document the new elementwise tan
builtin.
- `clang/include/clang/Basic/Builtins.td` - Implement the tan builtin.
- `clang/lib/CodeGen/CGBuiltin.cpp` - invoke the tan intrinsic on uses
of the builtin
- `clang/lib/Headers/hlsl/hlsl_intrinsics.h` - Associate the tan builtin
with the equivalent hlsl apis
- `clang/lib/Sema/SemaChecking.cpp` - Add generic sema checks as well as
HLSL specifc sema checks to the tan builtin
- `llvm/include/llvm/IR/Intrinsics.td` - Create the tan intrinsic
- `llvm/docs/LangRef.rst` - Document the tan intrinsic
For global functions and static methods the MSVC ABI returns
structs/classes with a deleted copy assignment operator indirectly.
From local testing this ABI holds true for all currently supported
architectures including ARM64EC.
Adds a builtin and intrinsic for the f32.load_f16 instruction.
The instruction loads an f16 value from memory and puts it in an f32.
Specified at:
29a9b9462c/proposals/half-precision/Overview.md
Note: the current spec has f32.load_f16 as opcode 0xFD0120, but this is
incorrect and will be changed to 0xFC30 soon.
When set, the compiler will use separate unique sections for global
symbols in named special sections (e.g. symbols that are annotated with
__attribute__((section(...)))). Doing so enables linker GC to collect
unused symbols without having to use a different section per-symbol.
When using a hard-float ABI for a target without FP registers, it's not
possible to correctly generate code for functions with arguments which
must be passed in floating-point registers. This is diagnosed in CodeGen
instead of Sema, to more closely match GCC's behaviour around inline
functions, which is relied on by the Linux kernel.
Previously, this only checked function signatures as they were
code-generated, but this missed some cases:
* Calls to functions not defined in this translation unit.
* Calls through function pointers.
* Calls to variadic functions, where the variadic arguments have a
floating-point type.
This adds checks to function calls, as well as definitions, so that
these cases are correctly diagnosed.
This is a fix for the issue #87758 where fast-math flags are not
propagated all builtins.
It seems like pragmas with fast math flags was only propagated to calls
of unary floating point builtins. This patch propagate them also for
binary and ternary floating point builtins.
For arch64 features, such as Branch Target Identification or MTE (Memory
Tagging Extension), compatible with targets that lack their support we
may encounter scenarios where a binary compiled with MTE for example is
executed on both MTE and non-MTE hardware and we still need to detect at
runtime whether the MTE feature is available to choose the appropriate
function version.
So, we cannot optimize the function multi versioning resolver by
removing checks for these features enabled for the target during
compilation.
https://wg21.link/P2809R3
This is applied as a DR to C++11 (C++98 did not guarantee forward
progress and is left untouched)
As an extension (and to preserve existing behavior in C), we consider
all controlling expression that can be constant folded
in the front end, not just standard constant expressions.
Prior to this change the debug-location for the
`llvm.instrprof.increment` intrinsic was set to whatever the current
DIBuilder's current debug location was set to. This meant that for
switch-statements, a counter's location was set to the previous case's
debug-location, causing confusing stepping behaviour in debuggers. This
patch makes sure we attach a dummy debug-location for the increment
instructions.
rdar://123050737
Make sure that the result from the popcnt/ctlz/cttz intrinsics is
unsigned casted to int, rather than casted as a signed value, when
expanding the __builtin_popcountg/__builtin_ctzg/__builtin_clzg
builtins.
An example would be
unsigned _BitInt(1) x = ...;
int y = __builtin_popcountg(x);
which previously was incorrectly expanded to
%1 = call i1 @llvm.ctpop.i1(i1 %0)
%cast = sext i1 %1 to i32
Since the input type is generic for those "g" versions of the builtins
the intrinsic call may return a value for which the sign bit is set
(that could typically for BitInt of size 1 and 2). So we need to emit a
zext rather than a sext to avoid negative results.
This patch fixes debug records in clang, by adding support for debug
records to the only remaining place that refers to DbgVariableIntrinsics
directly and does not handle DbgVariableRecords.
The PR implements a subset of features of function
__builtin_cpu_support() for AIX OS based on the information which AIX
kernel runtime variable `_system_configuration` and function call `getsystemcfg()` of
/usr/include/sys/systemcfg.h in AIX OS can provide.
Following subset of features are supported in the PR
"arch_3_00", "arch_3_1","booke","cellbe","darn","dfp","dscr" ,"ebb","efpsingle","efpdouble","fpu","htm","isel",
"mma","mmu","pa6t","power4","power5","power5+","power6x","ppc32","ppc601","ppc64","ppcle","smt",
"spe","tar","true_le","ucache","vsx"
Place constant initializer globals into the constant address space.
Clang generates such globals for e.g. larger array member initializers
of classes and then emits copy operations from the global to the
object(s). The globals are never written so they ought to be in the
constant address space.
MSVC linker merges functions having comdat which have identical set of
instructions. CUDA uses kernel stub function as key to look up kernels
in device executables. If kernel stub function for different kernels are
merged by ICF, incorrect kernels will be launched.
To prevent ICF from merging kernel stub functions, an unique global
variable is created for each kernel stub function having comdat and a
store is added to the kernel stub function. This makes the set of
instructions in each kernel function unique.
Fixes: https://github.com/llvm/llvm-project/issues/88883
When building the Linux kernel for i386, the -mregparm=3 option is
enabled. Crashes were observed in the sanitizer handler functions, and
the problem was found to be mismatched calling convention.
As was fixed in commit c167c0a4dcdb ("[BuildLibCalls] infer inreg param
attrs from NumRegisterParameters"), call arguments need to be marked as
"in register" when -mregparm is set. Use the same helper developed there
to update the function arguments.
Since CreateRuntimeFunction() is actually part of CodeGenModule, storage
of the -mregparm value is also moved to the constructor, as doing this
in Release() is too late.
Fixes: https://github.com/llvm/llvm-project/issues/89670
Currently, a lot of `__builtin_reduce_*` function do not support
scalable vectors, i.e., ARM SVE and RISCV V. This PR adds support for
them. The main code change is to use a different path to extract the
type from the vectors, the rest is the same and LLVM supports the reduce
functions for `vscale` vectors.
This PR adds scalable vector support for:
- `__builtin_reduce_add`
- `__builtin_reduce_mul`
- `__builtin_reduce_xor`
- `__builtin_reduce_or`
- `__builtin_reduce_and`
- `__builtin_reduce_min`
- `__builtin_reduce_max`
Note: For all except `min/max`, the element type must still be an
integer value. Adding floating point support for `add` and `mul` is
still an open TODO.
PR https://github.com/llvm/llvm-project/pull/89567 fix the `#pragma
unroll N` crash issue in dependent context, but it's introduce an new
issue:
Since https://github.com/llvm/llvm-project/pull/89567, if `N` is value
dependent, 'option' and 'state' were ` (LoopHintAttr::Unroll,
LoopHintAttr::Enable)`. Therefor, clang's code generator generated
incorrect IR metadata.
For the situation `#pragma {GCC} unroll {0|1}`, before template
instantiation, this PR tweak the 'option' to `LoopHintAttr::UnrollCount`
and 'state' to `LoopHintAttr::Numeric`. During template instantiation
and if unroll count is 0 or 1 this PR tweak 'option' to
`LoopHintAttr::Unroll` and 'state' to `LoopHintAttr::Disable`. We don't
use `LoopHintAttr::UnrollCount` here because it's will emit an redundant
LLVM IR metadata `!{!"llvm.loop.unroll.count", i32 1}` when unroll count
is 1.
---------
Signed-off-by: yronglin <yronglin777@gmail.com>
Latest diff:
f1ab4c2677..adf9bc902b
We address two additional bugs here:
### Problem 1: Deactivated normal cleanup still runs, leading to
double-free
Consider the following:
```cpp
struct A { };
struct B { B(const A&); };
struct S {
A a;
B b;
};
int AcceptS(S s);
void Accept2(int x, int y);
void Test() {
Accept2(AcceptS({.a = A{}, .b = A{}}), ({ return; 0; }));
}
```
We add cleanups as follows:
1. push dtor for field `S::a`
2. push dtor for temp `A{}` (used by ` B(const A&)` in `.b = A{}`)
3. push dtor for field `S::b`
4. Deactivate 3 `S::b`-> This pops the cleanup.
5. Deactivate 1 `S::a` -> Does not pop the cleanup as *2* is top. Should
create _active flag_!!
6. push dtor for `~S()`.
7. ...
It is important to deactivate **5** using active flags. Without the
active flags, the `return` will fallthrough it and would run both `~S()`
and dtor `S::a` leading to **double free** of `~A()`.
In this patch, we unconditionally emit active flags while deactivating
normal cleanups. These flags are deleted later by the `AllocaTracker` if
the cleanup is not emitted.
### Problem 2: Missing cleanup for conditional lifetime extension
We push 2 cleanups for lifetime-extended cleanup. The first cleanup is
useful if we exit from the middle of the expression (stmt-expr/coro
suspensions). This is deactivated after full-expr, and a new cleanup is
pushed, extending the lifetime of the temporaries (to the scope of the
reference being initialized).
If this lifetime extension happens to be conditional, then we use active
flags to remember whether the branch was taken and if the object was
initialized.
Previously, we used a **single** active flag, which was used by both
cleanups. This is wrong because the first cleanup will be forced to
deactivate after the full-expr and therefore this **active** flag will
always be **inactive**. The dtor for the lifetime extended entity would
not run as it always sees an **inactive** flag.
In this patch, we solve this using two separate active flags for both
cleanups. Both of them are activated if the conditional branch is taken,
but only one of them is deactivated after the full-expr.
---
Fixes https://github.com/llvm/llvm-project/issues/63818
Fixes https://github.com/llvm/llvm-project/issues/88478
---
Previous PR logs:
1. https://github.com/llvm/llvm-project/pull/85398
2. https://github.com/llvm/llvm-project/pull/88670
3. https://github.com/llvm/llvm-project/pull/88751
4. https://github.com/llvm/llvm-project/pull/88884
This patch is moving out following intrinsics:
* vector.interleave2/deinterleave2
* vector.reverse
* vector.splice
from the experimental namespace.
All these intrinsics exist in LLVM for more than a year now, and are
widely used, so should not be considered as experimental.
Linked to https://github.com/gnustep/libobjc2/pull/289.
More information can be found in issue: #88273.
My solution involves creating a new message-send function for this
calling convention when targeting MSVC. Additional information is
available in the libobjc2 pull request.
I am unsure whether we should check for a runtime version where
objc_msgSend_stret2_np is guaranteed to be present or leave it as is,
considering it remains a critical bug. What are your thoughts about this
@davidchisnall?
OpenACC is going to need an array sections implementation that is a
simpler version/more restrictive version of the OpenMP version.
This patch moves `OMPArraySectionExpr` to `Expr.h` and renames it `ArraySectionExpr`,
then adds an enum to choose between the two.
This also fixes a couple of 'drive-by' issues that I discovered on the way,
but leaves the OpenACC Sema parts reasonably unimplemented (no semantic
analysis implementation), as that will be a followup patch.
Workaround for issue #89774 until it can be properly fixed.
When `-gtemplate-alias` is specified Clang emits a DW_TAG_template_alias
for template aliases. This patch avoids an assertion failure by falling
back to the `-gno-template-alias` (default) behaviour, emitting a
DW_TAG_typedef, if the alias is instantiation dependent.
This introduces a new file, RISCVISAUtils.cpp and moves the rest of
RISCVISAInfo to the TargetParser library.
This will allow us to generate part of RISCVISAInfo.cpp using tablegen.