In tandem with #146800, this PR fixes#145370
This PR simplifies the logic for collapsing GEP chains and replacing
GEPs to multidimensional arrays with GEPs to flattened arrays. This
implementation avoids unnecessary recursion and more robustly computes
the index to the flattened array by using the GEPOperator's
collectOffset function, which has the side effect of allowing "i8 GEPs"
and other types of GEPs to be handled naturally in the flattening /
collapsing of GEP chains.
Furthermore, a handful of LLVM DirectX CodeGen tests have been edited to
fix incorrect GEP offsets, mismatched types (e.g., loading i32s from a
an array of floats), and typos.
Refactored IR2Vec vocabulary handling to improve code organization and error handling. This would help in upcoming PRs related to the IR2Vec tool.
(Tracking issue - #141817)
This was going out of its way to explicitly mark these as
ARM_AAPCS_VFP. This has been explicitly set since 8b40366b54bd4,
where the commit message states that "sincos" (not sincos_stret)
has a special calling convention. However, that commit also sets
the calling convention for all libcalls to ARM_AAPCS_VFP, and
getEffectiveCallingConv returns the same for CCC anyway in tests
using isWatchABI triples.
The net result of this appears to be a change in behavior when
using -float-abi=soft with isWatchABI, which have no tests so
I assume this is a theoretical combination.
If I assert
```
if (getTargetMachine().getTargetTriple().isWatchABI()) {
assert(!useSoftFloat());
assert(getEffectiveCallingConv(CallingConv::C, false) == CallingConv::ARM_AAPCS_VFP);
}
```
Only 2 tests fail the second condition, which look like copy paste
accidents
using v7k triples with linux and only needed a filler triple. This is a
consequence
of strangely using the target architecture in place of the OS ABI check,
as was done in 042a6c1fe19caf48af7e287dc8f6fd5fec158093
We have some 48-bit instructions in the `Xqci` spec that currently
cannot be compressed to their 32-bit variants due to the constraint in
`CompressInstEmitter` on destination instruction operands not being
allowed to mismatch with the DAG operands.
For eg. the` QC_E_ADDI` instruction can be compressed to the `ADDI`
instruction when the immediate is signed-12 bit but this is currently
not possible since the `QC_E_ADDI` instruction has `GPRNoX0` register
operands while the `ADDI` instruction has `GPR` register operands
leading to an operand type validation error.
I think we can remove the check that only source instruction operands
can mismatch with the corresponding DAG operands and rely on the fact
that we check if the DAG register operand type is a subclass of the
instruction register operand type.
This enables producing a "variable is uninitialized" warning when a
value is passed to a pointer-to-const argument:
```
void foo(const int *);
void test() {
int *v;
foo(v);
}
```
Fixes#37460
This option is similar to -Wuninitialized-const-reference, but diagnoses
the passing of an uninitialized value via a const pointer, like in the
following code:
```
void foo(const int *);
void test() {
int v;
foo(&v);
}
```
This is an extract from #147221 as suggested in [this
comment](https://github.com/llvm/llvm-project/pull/147221#discussion_r2190998730).
Reapply attempt for : https://github.com/llvm/llvm-project/pull/148291
Fix for the build failure reported in :
https://lab.llvm.org/buildbot/#/builders/116/builds/15477
-----
This crash is caused by mismatch of distributed type returned by
`getDistributedType` and intended distributed type for forOp results.
Solution diff:
20c2cf6766
Example:
```
func.func @warp_scf_for_broadcasted_result(%arg0: index) -> vector<1xf32> {
%c128 = arith.constant 128 : index
%c1 = arith.constant 1 : index
%c0 = arith.constant 0 : index
%2 = gpu.warp_execute_on_lane_0(%arg0)[32] -> (vector<1xf32>) {
%ini = "some_def"() : () -> (vector<1xf32>)
%0 = scf.for %arg3 = %c0 to %c128 step %c1 iter_args(%arg4 = %ini) -> (vector<1xf32>) {
%1 = "some_op"(%arg4) : (vector<1xf32>) -> (vector<1xf32>)
scf.yield %1 : vector<1xf32>
}
gpu.yield %0 : vector<1xf32>
}
return %2 : vector<1xf32>
}
```
In this case the distributed type for forOp result is `vector<1xf32>`
(result is not distributed and broadcasted to all lanes instead).
However, in this case `getDistributedType` will return NULL type.
Therefore, if the distributed type can be recovered from warpOp, we
should always do that first before using `getDistributedType`
Enabled AArch64 runs for these tests where it made sense.
Also removed the "temporary" suffixes filter that was added over 10
years ago, I believe the "misched-copy.s output file" has been cleaned
from the runners by now...
Addresses
<https://github.com/llvm/llvm-project/pull/147421#discussion_r2191234968>
for x86
If more than one `catchpad ` uses the same `alloca` for their catch
objects, then we will allocate more than one object in the fixed area
resulting in wasted stack space.
As a follow up, Clang could be updated to re-use the same `alloca` for
all by-reference and by-pointer catch objects.
This check already understands how `constexpr` makes initialization
order problems impossible, and C++20's `constinit` provides the exact
same guarantees.
When one kind of diagnostics is disabled, this should not preclude other
diagnostics from displaying, even if they have lower priority. For
example, this should print a warning about passing an uninitialized
variable as a const reference:
```
> cat test.cpp
void foo(const int &);
int f(bool a) {
int v;
if (a) {
foo(v);
v = 5;
}
return v;
}
> clang test.cpp -fsyntax-only -Wuninitialized -Wno-sometimes-uninitialized
```
OpenACC spec says `A var may appear at most once in all the clauses of
declare directives for a function, subroutine, program, or module.` but
our implementation allows it with a warning generated. Add this to the
diviation list for record.
This reverts commit d43a80936d437d217d5a6dbbaa5fb131c27e7085.
With the correctness issue blocking the recommit finally fixed
(5d01697ec6cb), again unconditionally check if accesses are completely
before or after each other.
This patch fixes:
llvm/lib/Analysis/IR2Vec.cpp:280:3: error: default label in switch
which covers all enumeration values
[-Werror,-Wcovered-switch-default]
This reverts commit 0c0aa56cdcf1fe3970a5f3875db412530512fc07.
This time with the following fixes for buildbot failures:
- Add underscore prefixes to symbol names on Apple platforms.
- Modify the test so that it skips the crash tests on platforms where
they are not expected to pass:
- Platforms that implement FEAT_PAuth but not FEAT_FPAC (e.g.
Apple M1, Cortex-A78C)
- Platforms where DA key is disabled (e.g. older Linux kernels,
Linux kernels with PAC disabled, likely Windows)
Original commit message follows:
The emulated PAC runtime functions emulate the ARMv8.3a pointer
authentication instructions and are intended for use in heterogeneous
testing environments. For more information, see the associated RFC:
https://discourse.llvm.org/t/rfc-emulated-pac/85557
Reviewers: mstorsjo, pawosm-arm, atrosinenko
Reviewed By: atrosinenko
Pull Request: https://github.com/llvm/llvm-project/pull/148094
`__bolt_instr_data_dump()` will find instrumented binary name by
iterating through entries under directory `/proc/self/map_files`,
and then open the binary and memory map it onto heap in order
to locate `.bolt.instr.tables` section to read the descriptions.
If binary name is already known and/or binary is already opened
as memory mapped, we can pass binary name and/or memory
buffer directly to `__bolt_instr_data_dump()` to save some work.
## Purpose
This patch is one in a series of code-mods that annotate LLVM’s public
interface for export. This patch annotates symbols that were added to
LLVM in the last two weeks and were missed by previous code-mods. The
annotations currently have no meaningful impact on the LLVM build;
however, they are a prerequisite to support an LLVM Windows DLL (shared
library) build.
## Background
This effort is tracked in #109483. Additional context is provided in
[this
discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307),
and documentation for `LLVM_ABI` and related annotations is found in the
LLVM repo
[here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst).
## Overview
These changes were generated automatically using the [Interface
Definition Scanner (IDS)](https://github.com/compnerd/ids) tool,
followed formatting with `git clang-format`.
## Validation
Local builds and tests to validate cross-platform compatibility. This
included llvm, clang, and lldb on the following configurations:
- Windows with MSVC
- Windows with Clang
- Linux with GCC
- Linux with Clang
Add a new fold to instcombine to move SExt/ZExt across identity
shuffles, applying the cast after the shuffle. This sinks extends and
can enable more general additional folding of both shuffles (and
related instructions) and extends. If backends prefer splitting up doing
casts first, the extends can be hoisted again in VectorCombine for
example.
A larger example is included in the load_i32_zext_to_v4i32. The wider
extend is easier to compute an accurate cost for and targets (like
AArch64) can lower a single wider extend more efficiently than multiple
separate extends.
This is a generalization of a VectorCombine version
(https://github.com/llvm/llvm-project/pull/141109) as suggested by
@preames.
PR: https://github.com/llvm/llvm-project/pull/146901
This patch permits loops with vector instructions to be unrolled.
Today there is an early exit in `getUnrollingPreferences()` of AArch64
targets if a vector instruction is observed in any of the loop blocks.
This patch fixes that so common loops like this one get a chance to be
unrolled:
void saxpy (float * dst, const float * src, const float a, const int
len) {
float32x4_t * vdst = (float32x4_t *)dst;
float32x4_t * vsrc = (float32x4_t *)src;
float32x4_t vk = vdupq_n_f32(a);
for (int i = 0; i < (len >> 2); i++)
{
vdst[i] = vaddq_f32(vdst[i], vmulq_f32(vsrc[i], vk));
}
}
Auto-vectorized loops are still not unrolled, unless they were not
interleaved when vectorized.
The provided test case shows the enhancement on top of runtime/partial
unrolling, depending on the CPU.
PR: https://github.com/llvm/llvm-project/pull/147420
Clang can chose which sort of source-file hash is attached to a DIFile
metadata node. However, whenever hashing is possible, we /always/ attach
a hash. This patch permits users who want DWARF5 but don't want the file
hashes to opt out, by adding a "none" option to the -gsrc-hash option
that skips hash computation.
Factoring out and combining `isInterleaveIntrinsic`,
`isDeinterleaveIntrinsic`, and `getIntrinsicFactor` into
`getInterleaveIntrinsicFactor` and `getDeinterleaveIntrinsicFactor`
inside VectorUtils.
NFC.
This PR add stacktrace of escaped exception to
`bugprone-exception-escape` check.
Changes:
1. Modified `ExceptionAnalyzer` and `ExceptionInfo` classes to hold
stacktrace of escaped exception in `llvm::MapVector`. `llvm::MapVector`
is needed to hold relative positions of functions in stack as well as
have fast lookup.
2. Added new diagnostics based of `misc-no-recursion` check.
Fixes https://github.com/llvm/llvm-project/issues/87422.
The dependency directive scanner was incorrectly classifying namespaces
such as `import::inner xi` as directives. According to P1857R3, `import` should
not be treated as a directive when followed by `::`.
This change fixes that behavior.
According to the current LangRef, atomics of sizes larger than a
target-dependent limit are non-conformant IR. Presumably, that size
limit is `TargetLoweringBase::getMaxAtomicSizeInBitsSupported()`. As a
consequence, one would not even know whether IR is valid without
instantiating the Target backend.
To get around this, Clang's CGAtomic uses a constant "16 bytes" for the
maximally supported atomic. The verifier only checks the power-of-two
requirement.
In a discussion with jyknight, the intention is rather that the
AtomicExpandPass will just lower everything larger than the
target-supported atomic sizes to libcall (such as `__atomic_load`).
According to this interpretation, the size limit needs only be known by
the lowering and does not affect the IR specification.
The original "target-specific size limit" had been added in
59b66883eacbc62a09c09f08bcbfdce7af46cf31. The LangRef change is needed
for #134455 because otherwise frontends need to pass a TargetLowering
object to the helper functions just to know what the target-specific
limit is.
This also changes the LangRef for atomicrmw. Are there libatomic
fallbacks for these? If not, LLVM-IR validity still depends on
instantiating the actual backend. There are also some intrinsics such as
`llvm.memcpy.element.unordered.atomic` that have this constraint but do
not change in this PR.
This patch introduces a new Python-based benchmarking tool for Clang's Lifetime Safety analysis. This script automates the process of generating targeted C++ test cases, measuring the performance of the analysis, and determining its empirical computational complexity.
The tool helps track and validate the performance of the dataflow analysis, ensuring that future optimizations have a measurable impact and that the analysis scales efficiently.
Components:
* **Generate**: Creates pathological C++ test cases with specific patterns (pointer cycles and CFG merges) designed to stress-test the analysis.
* **Compile & Trace**: Compiles the generated code using `-ftime-trace`.
* **Analyze & Report**: Performs a curve-fit on the timing data to determine the empirical complexity ( **O(n<sup>k</sup>)**) and outputs a markdown report.
---
**Usage**:
<details>
<summary>ninja benchmark_lifetime_safety_analysis</summary>
[12/13] Running Lifetime Analysis performance benchmarks...
Benchmark files will be saved in: <BUILD_DIR_REDACTED>/tools/clang/test/Analysis/LifetimeSafety/benchmark_results
Running performance benchmarks...
--- Running Test: Cycle with N=10 ---
Total: 10.11 ms | Analysis: 2.70 ms
--- Running Test: Cycle with N=25 ---
Total: 61.51 ms | Analysis: 53.05 ms
--- Running Test: Cycle with N=50 ---
Total: 688.56 ms | Analysis: 677.32 ms
--- Running Test: Cycle with N=75 ---
Total: 3.09 s | Analysis: 3.07 s
--- Running Test: Cycle with N=100 ---
Total: 9.31 s | Analysis: 9.30 s
--- Running Test: Cycle with N=150 ---
Total: 44.92 s | Analysis: 44.91 s
--- Running Test: Merge with N=10 ---
Total: 8.54 ms | Analysis: 0.00 ms
--- Running Test: Merge with N=50 ---
Total: 38.79 ms | Analysis: 27.13 ms
--- Running Test: Merge with N=100 ---
Total: 219.45 ms | Analysis: 205.20 ms
--- Running Test: Merge with N=200 ---
Total: 1.67 s | Analysis: 1.65 s
--- Running Test: Merge with N=400 ---
Total: 12.57 s | Analysis: 12.55 s
--- Running Test: Merge with N=800 ---
Total: 100.48 s | Analysis: 100.43 s
Generating Markdown Report...
</details>
<details>
<summary>Sample Report: </summary>
# Lifetime Analysis Performance Report
> Generated on: 2025-07-08 14:18:52
---
## Test Case: Pointer Cycle in Loop
| N | Analysis Time | Total Clang Time |
|:----|--------------:|-----------------:|
| 10 | 2.70 ms | 10.11 ms |
| 25 | 53.05 ms | 61.51 ms |
| 50 | 677.32 ms | 688.56 ms |
| 75 | 3.07 s | 3.09 s |
| 100 | 9.30 s | 9.31 s |
| 150 | 44.91 s | 44.92 s |
**Complexity Analysis:**
- The performance for this case scales approx. as **O(n<sup>3.88</sup>)**.
- **95% Confidence interval for exponent:** `[3.86, 3.90]`.
---
## Test Case: CFG Merges
| N | Analysis Time | Total Clang Time |
|:----|--------------:|-----------------:|
| 10 | 0.00 ms | 8.54 ms |
| 50 | 27.13 ms | 38.79 ms |
| 100 | 205.20 ms | 219.45 ms |
| 200 | 1.65 s | 1.67 s |
| 400 | 12.55 s | 12.57 s |
| 800 | 100.43 s | 100.48 s |
**Complexity Analysis:**
- The performance for this case scales approx. as **O(n<sup>3.00</sup>)**.
- **95% Confidence interval for exponent:** `[2.99, 3.01]`.
---
</details>