When SPIRV-LLVM-Translator is built in-tree (i.e., placed in
llvm/projects folder), llvm-spirv target exists.
Drop legacy llvm-spirv_target dependency (was for non-runtime build) and
add llvm-spirv to runtimes dependencies.
fixes#188131
This change address stylistic changes @bogners requested in
https://github.com/llvm/llvm-project/pull/186215/ It also adds the
`storeMatrixArrayFromVector`. to
SPIRVLegalizePointerCast.cpp when we detect the matrix array of vector
memory layout
Changes to storeArrayFromVector were cleanup
Assisted-by Github Copilot for test case check lines
The docgen script was previously hardcoded to assume all implemented
macros must be placed in a *-macros.h header. This updates docgen to
read inline macro_value properties directly from the source YAML files,
correctly recognizing them as implemented.
Define the POSIX cpio.h header and its standard macros in the libc build
system. Configure the macros directly in the YAML specification to allow
automated header generation without a custom definition template.
After this change, Address::SetSection() had only one use left (in a
unit test) and was removed. Address::ClearSection() had no uses, now
also removed. (It is unlikely that someone needs to change the section
without simultaneously changing the section offset, and for that we have
a constructor.)
When functions marked with `[[gnu::warning/error]]` are called through inlined functions, we now show the inlining chain that led to the call when ``-fdiagnostics-show-inlining-chain`` is enabled.
With this flag, two modes are possible:
- **heuristic** mode: Uses `srcloc` and `inlined.from` metadata to reconstruct the inlining chain. Functions that are `inline`, `static`, `always_inline`, or in anonymous namespaces get `srcloc` metadata attached. This mode emits a note suggesting `-gline-directives-only` for more accurate locations.
- **debug** mode: Automatically used instead of heuristic when building with at least `-gline-directives-only` (implied by `-g1` or higher). Leverages `DILocation` debug info for reliable source locations.
Fixes: https://github.com/ClangBuiltLinux/linux/issues/1571
Make sure pattern exclusions have priority over the overflow behavior types when deciding whether or not to emit truncation checks.
Accomplish this by carrying an extra field through `ScalarConversionOpts` which we later check before emitting instrumentation.
Adds basic support for new heuristics for the CoExecSchedStrategy.
InstructionFlavor provides a way to map instructions to different
"Flavors". These "Flavors" all have special scheduling considerations --
either they map to different HarwareUnits, or have unique scheduling
properties like fences.
HardwareUnitInfo provides a way to track and analyze the usage of some
hardware resource across the current scheduling region.
CandidateHeuristics holds the state for new heuristics, as well as the
implementations.
In addition, this adds new heuristics to use the various support pieces
listed above. tryCriticalResource attempts to schedule instructions that
use the most demanded HardwareUnit. If no such instructions are ready to
be scheduled, tryCriticalResourceDependency attempts to schedule
instructions which enable instructions that use demanded HardwareUnits.
We are incrementally adding the new heuristics. While in the process of
this, the state of tryCandidateCoexec may not be great - as is the case
after this PR.
This patch teaches the MemProf matching pass to dump inline call
stacks as analysis remarks like so:
frame: 704e4117e6a62739 main:10:5
frame: 273929e54b9f1234 foo:2:12
inline call stack: 704e4117e6a62739,273929e54b9f1234
The output consists of two types of remarks:
- "frame": Acts as a dictionary mapping a unique MD5-based FrameID
to source information (function name, line offset, and column).
- "inline call stack": Provides the full call stack for a call site
as a sequence of FrameIDs.
Both types of remarks are deduplicated to reduce the output size.
This patch is intended to be a debugging aid.
Summary:
People need to be able to build this without a CUDA installation.
Long term we should bump up the minimum version as I'm pretty sure every
architecture before this has been deprecated by NVIDIA.
This is failing in some configurations on AArch64 Linux. Given there are
a lot of follow-up commits that makes this hard to revert, just disable
it for now pending future investigation.
When git-clang-format is invoked with no explicit commit arguments and
both PRE_COMMIT_FROM_REF and PRE_COMMIT_TO_REF are set, the script
automatically uses those refs as the diff range and implies --diff. If
the variables are absent, existing behavior is fully preserved.
This allows projects to use `git-clang-format` directly inside CI
pipelines via the [pre-commit](https://pre-commit.com/) framework
without any wrapper scripts or extra configuration.
Closes: #188813
No existing lit test suite for this script. Verified manually that env
vars activate two-commit diff mode, existing behavior is preserved
without them, and explicit CLI args always override them.
This change adds implementation for named barriers for SPIRV backend.
Since there is no built in API/intrinsics for named barrier in SPIRV,
the implementation loosely follows implementation for AMD
When an ArrayRefParameter (or OptionalArrayRefParameter) appears in a
non-last position within a struct() assembly format directive, the
printed
output is ambiguous: the comma-separated array elements are
indistinguishable from the struct-level commas separating key-value
pairs.
Fix this by wrapping such parameters in square brackets in both the
generated printer and parser. The printer emits '[' before and ']' after
the array value; the parser calls parseLSquare()/parseRSquare() around
the
FieldParser call. Parameters with a custom printer or parser are
unaffected
(the user controls the format in that case).
Fixes#156623
Assisted-by: Claude Code
XeVM currently doesn't support native binary generation. This PR enables
Ahead of Time (AOT) compilation of gpu module to native binary using
`ocloc`.
Currently, only works with LevelZeroRuntimeWrappers.
When -msan-dump-strict-instructions and
-msan-dump-heuristic-instructions are simultaneously enabled, it is
unclear from the output whether each instruction is strictly vs.
heuristically handled. [*] This patch fixes the issue by tagging the
output.
The actual instrumentation of the code is unaffected by this change.
[*] A workaround is to compile the code once with only
-msan-dump-strict-instructions, and a second time with
-msan-dump-heuristic-instructions, but this unnecessarily doubles the
compilation time.
Switches the following headers to hdrgen-produced ones by referencing
some macro from C standard and the file containing the declarations in
corresponding YAML files:
* limits.h (referenced _WIDTH / _MAX / _MIN families).
* locale.h (referenced LC_ family).
* time.h (referenced CLOCKS_PER_SEC).
* wchar.h (referenced WEOF).
The in-process ThinLTO backend typically generates object files in
memory and adds them directly to the link, except when the ThinLTO cache
is in use. DTLTO is unusual in that it adds files to the link from disk
in all cases.
When the ThinLTO cache is not in use, ThinLTO adds files via an
`AddStreamFn` callback provided by the linker, which ultimately appends
to a `SmallVector` in LLD. When the cache is in use, the linker supplies
an `AddBufferFn` callback that adds files more efficiently (by moving
`MemoryBuffer` ownership).
This patch adds a mandatory `AddBufferFn` to the DTLTO ThinLTO backend.
The backend uses this to add files to the link more efficiently.
Additionally:
- Move AddStream from CGThinBackend to InProcessThinBackend, for reader
clarity.
- Modify linker comments that implied the AddBuffer path is
cache-specific.
For a Clang link (Debug build with sanitizers and instrumentation) using
an optimized toolchain (PGO non-LTO, llvmorg-22.1.0), measuring the mean
`Add DTLTO files to the link` time trace scope duration:
- On Windows (Windows 11 Pro Build 26200, AMD Family 25 @ ~4.5 GHz, 16
cores/32 threads, 64 GB RAM), this patch reduces the mean from
2799.148 ms to 157.972 ms.
- On Linux (Ubuntu 24.04.3 LTS Kernel 6.14, Ryzen 9 5950X, 16
cores/32 threads, boost up to 5.09 GHz, 64 GB RAM), this patch reduces
the mean from 255.291 ms to 41.630 ms.
Based on work by @romanova-ekaterina and @kbelochapka.
This reverts commit 11b439c5c5a07c95d30ce25abd6adf7f5fbb7105.
timeTraceProfilerCleanup() can be called before the temporary file
deletion has completed in LLD. This causes memory leaks that were
flagged up by sanitizer builds, e.g.:
https://lab.llvm.org/buildbot/#/builders/24/builds/18840/steps/11/logs/stdio
Added a brief description of the libc-shared-tests target to the
Building and Testing page.
This target allows running tests for shared standalone components like
math primitives without the full libc runtime.
`loopUnrollByFactor` used `getConstantIntValue()` to read loop bounds,
which sign-extends the constant to `int64_t`. For unsigned `scf.for`
loops with narrow integer types (e.g. i1, i2, i3), this produces wrong
results: a bound such as `1 : i1` has `getSExtValue() == -1` but should
be treated as `1` (unsigned).
Two bugs were introduced by this:
1. **Wrong epilogue detection**: the comparison `upperBoundUnrolledCst <
ubCst` used signed int64, so e.g. `0 < -1` (where ubCst is the
sign-extended i1 value 1) evaluated to false, suppressing the epilogue
that should execute the remaining iterations.
2. **Zero step after overflow**: when `tripCountEvenMultiple == 0` (all
iterations go to the epilogue), `stepUnrolledCst = stepCst *
unrollFactor` can overflow the bound type's bitwidth and wrap to 0. A
zero step causes `constantTripCount` to return `nullopt`, preventing the
zero-trip main loop from being elided.
Fix:
- Use zero-extension (`getZExtValue`) instead of sign-extension when
reading bounds for unsigned loops.
- When `tripCountEvenMultiple == 0`, keep the original step for the main
loop to avoid the zero-step issue (the step value is irrelevant for a
zero-trip loop anyway).
Fixes#163743
Assisted-by: Claude Code
This PR adds scalar type to convert_layout op's result and operand. It
also enhance convert_layout pattern in wg-to-sg, unrolling, and
sg-to-lane distribution.
It is to support reduction to scalar, whether currently the layout
propagation doesn't support scalar to carry any layout. The design
choice to insert convert_layout op after reduction-to-scalar op to
record the layout information permanently across the passes.
Fix broken POSIX basedefs links for nested headers in llvm-libc docs.
The docgen script currently emits paths like `sys/wait.h.html`, but the
Open Group uses `sys_wait.h.html`, for example:
-
https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/sys_wait.h.html
This updates nested-header link generation while leaving flat headers
unchanged.
This is to highlight places where we (probably unintentionally)
construct an `Address` object from an already resolved address, making
it unresolved again.
See the changes in `DynamicLoaderDarwin.cpp` for a quick example.
Also, use this constructor instead of `Address(lldb::addr_t file_addr,
const SectionList *section_list)` when `section_list` is `nullptr`.
The revision clears a long-due TODO, which supports the lowering when
transfer_read/write ops have mask via inserting a vector.shape_cast op
for the masked value.
---------
Signed-off-by: hanhanW <hanhan0912@gmail.com>
Refactor to proceed #185964.
Much of this is a refactor to address this issues. Instead of iterating over one chain at a time, attempting all VFs for that given change, we now iterate over VFs, trying each chain for the current VF.
Includes fix for use after free bug.
Summary:
Currently, building through runtimes will yield this warning:
```
CMake Warning at compiler-rt/cmake/Modules/CompilerRTUtils.cmake:335 (message):
LLVMTestingSupport not found in LLVM_AVAILABLE_LIBS
Call Stack (most recent call first)
```
This is due to the fact that the builtins target does not go through the
s tandard runtimes patch and sets them as BUILDTREE_ONLY so they do not
show up. These are not used in this case, so just guard the condition to
suppress the warning.