This commit removes the class `SwitchNodeBuilder` because it just
obscured the logic of switch handling by hiding some parts of it in
another source file.
This patch is motivated by
https://github.com/llvm/llvm-project/pull/189943, where we would like to
print the "these module scripts weren't loaded" warning for *all*
modules batched together. I.e., we want to print the warning *after* all
the script loading attempts, not from within each attempt.
To do so we want to hoist the `ReportWarning` calls in
`Module::LoadScriptingResourceInTarget` out into the callsites. But if
we do that, the callers have to remember to print the warnings. To avoid
this, we redirect all callsites to use
`ModuleList::LoadScriptingResourceInTarget`, which will be responsible
for printing the warnings.
To avoid future accidental uses of
`Module::LoadScriptingResourceInTarget` I moved the API into
`ModuleList` and made it `private`.
This PR aims to make the pass more generic by removing the ModuleOp
restriction. This PR reimplements the logic using a standalone
PassManager. Additionally, the isInteresting method has been updated to
accept Operation* for better flexibility. Finally, a dedicated test
directory has been added to improve the organization of OptReductionPass
tests.
addAliasScopeMetadata in AMDGPULowerKernelArguments skips instructions
with empty PtrArgs, including memory-accessing calls that have no
pointer arguments (e.g. builtins like threadIdx()). Because these calls
never receive !noalias metadata, ScopedNoAliasAA cannot prove they don't
alias noalias kernel arguments. MemorySSA then conservatively reports
them as clobbers, which prevents AMDGPUAnnotateUniformValues from
marking loads as noclobber, blocking scalarization (s_load) and forcing
expensive vector loads (global_load) instead.
Fix by adding all noalias kernel argument scopes to !noalias metadata
for memory-accessing instructions with no pointer arguments. Since such
instructions cannot access memory through any kernel pointer argument,
all noalias scopes are safe to apply.
This fixes a performance regression in rocFFT introduced by bd9668df0f00
("[AMDGPU] Propagate alias information in AMDGPULowerKernelArguments").
Assisted-by: Claude Opus
The `bool serial` condition in scanRelocations disabled parallelism for
three cases: -z nocombreloc, MIPS, and PPC64. Resolve two cases:
- nocombreloc: .rela.dyn is now always created with combreloc=true so
non-relative relocations are sorted deterministically. Since
#187964 already separates relative relocations unconditionally,
the only remaining effect of -z nocombreloc is suppressing
DT_RELACOUNT (gated on ctx.arg.zCombreloc in DynamicSection).
- PPC64: After #181496 moved scanning into scanSectionImpl, the
sole thread-unsafe access is ctx.ppc64noTocRelax (DenseSet::insert).
Protect it with ctx.relocMutex, which is already used for rare
operations during parallel scanning.
MIPS retains serial scanning due to `MipsGotSection` mutations.
This test doesn't work as intended when an alternative default linker is
specified via `-DCLANG_DEFAULT_LINKER=ld`. If this test isn't intended
to support alternate default linker, lmk I can just change the
downstream usage I'm seeing, though I figure other folks may have
similar configurations. Repro:
```
cmake -S llvm -B build -DLLVM_ENABLE_PROJECTS="clang" -DCLANG_DEFAULT_LINKER=ld -GNinja
ninja -C build
./build/bin/llvm-lit -v clang/test/DebugInfo/CXX/hotpatch.cpp
...
possible intended match
# | 6: "/usr/bin/ld" "-out:hotpatch.exe" "-libpath:lib/amd64" "-libpath:atlmfc/lib/amd64" "-nologo" "-functionpadmin" "/tmp/lit-tmp-o7x0r1o_/hotpatch-4595de.obj"
```
afaict it passed before because `-mincremental-linker-compatible` was
being used until e97a42d5f9fe51de50aabd4d9bf6874a4955f9fa, which would
match on the compilation line.
`__arm_agnostic("sme_za_state")` does not require +sme, but we must
still preserve ZA in case the function is used with code that makes use
of ZA:
> The use of `__arm_agnostic("sme_za_state")` allows writing functions
> that are compatible with ZA state without having to share ZA state
> with the caller, as required by `__arm_preserves`. The use of this
> attribute does not imply that SME is available.
A kernel developer noticed that I missed a call to index the local
filesystem in one of our codepaths, and had a use case that depended on
that working.
rdar://173814556
In some situations such as reported at
https://github.com/llvm/llvm-project/pull/177953#issuecomment-4179014239,
LLVM_(DEFAULT_)TARGET_TRIPLE is not set. It is used to derive the output
directory in #177953. Only flang-rt currently uses
RUNTIMES_(INSTALL|OUTPUT)_RESOURCE_LIB_PATH, we should not fail building
other despite a missing LLVM_TARGET_TRIPLE.
Compiler-rt uses COMPILER_RT_DEFAULT_TARGET_TRIPLE instead which it
derives itself. Most other LLVM runtimes libraries just skip the target
portion of the library path (explicitly so since #93354). Do the same
for RUNTIMES_(INSTALL|OUTPUT)_RESOURCE_LIB_PATH which we hope eventually
can replace the other mechanisms.
…(#135079)"
This reverts commit a757f23404c594f4a48b4ddb6625f88b349d11d5. Commit
causes reduce.cu file in hipcub/warp go from 2 minutes of compilation to
taking several hours.
Many tests have ad hoc forms of the launch & break steps done by
`lldbutil.run_to_source_breakpoint`. This changes some of those tests to
use `run_to_source_breakpoint` instead.
Assisted-by: claude
We had a bug where exceptions caught with catch-all were not properly
handling a thrown exception if the catch-all handler enclosed a cleanup
handler. The structured CIR was generated correctly, but when we
flattened the CFG and introduced cir.eh.initiate operations, the
cir.eh.initiate for the cleanup's EH path was incorrectly marked as
cleanup-only, even though it chained to the dispatch for the catch-all
handler. This resulted in the landing pad generated for the cleanup not
being marked as having a catch-all handler, so the exception was not
caught.
This change fixes the problem in the FlattenCFG pass.
Assisted-by: Cursor / claude-4.6-opus-high
This PR adds the TableGen-generated headers from
https://github.com/llvm/llvm-project/pull/187610 to the HLSL
distribution.
Currently the HLSL distribution is incomplete due to missing these
generated headers, preventing successful compilation:
```
Command Output (stderr):
--
In file included from <built-in>:1:
In file included from D:\a\_work\1\ClangHLSL\Binaries\lib\clang\23\include\hlsl.h:24:
D:\a\_work\1\ClangHLSL\Binaries\lib\clang\23\include\hlsl/hlsl_alias_intrinsics.h:42:10: fatal error: 'hlsl_alias_intrinsics_gen.inc' file not found
42 | #include "hlsl_alias_intrinsics_gen.inc"
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
```
This PR fixes the error by including `hlsl_alias_intrinsics_gen.inc` and
`hlsl_inline_intrinsics_gen.inc` in the HLSL distribution.
This allows a build system to direct Clang to prune a module cache
directory using the same method Clang does internally.
This also changes `clang::maybePruneImpl` to clean up files directly in
the directory, not just subdirectories.
This moves the LLVM_LIBC_IS_DEFINED macro to its own header is
__support/macros. Its implementation leverages cpp::string_view
instead of rolling its own strcmp; this necessitated fixing
several missing constexpr in the string_view implementation.
The new __support/macros/macro-utils.h is also broken out to hold
the stringification macro and can be used in future for token
pasting shenanigans and other such generic macro machinery.
We had an off-by-one error in the CIR generation for array destructor
loops, causing us to miss destructing one element of the array. This
change fixes the problem.
As discussed in #182203, use enums instead.
I tried to name/use them appropriately, but I'm not sure sure I'm really
happy with the results; suggestions welcome.
For signed int-to-FP casts, ComputeNumSignBits can prove exactness where
computeKnownBits cannot -- e.g. through ashr(shl x, a), b where sign propagation is
tracked precisely but individual known bits are all unknown.
Summary:
`Status` is unfortunately heavily overloaded in practice. Things like
X11 define it as a macro. Best to just remove that possibility entirely.
Change `NVVM_SyncWarpOp` base class from `NVVM_Op` to
`NVVM_IntrOp<"bar.warp.sync">`, which auto-generates `llvmEnumName =
nvvm_bar_warp_sync` and registers it with
`-gen-intr-from-llvmir-conversions` and
`-gen-convertible-llvmir-intrinsics`. This enables LLVM IR to MLIR
import. The hand-written `llvmBuilder` is removed as the default
`LLVM_IntrOpBase` builder is equivalent.
Rename several arguments to intrinsic related functions from `ArgsTys`
to `OverloadTys` to better reflect their meaning. The only variables
left with name `ArgTys` now actually mean function argument types.
Also reamove an incorrect comment in Intrinsics.td. Dependent types do
allow forward references starting with
7957fc6547
The evaluation order of function arguments is unspecified by the C++
standard. We had two getNode calls as function arguments which causes
the nodes to be created in a different order depending on the compiler
used. This patch moves them to their own variables to ensure they are
called in the same order on all compilers.
Possible fix for #190148.
The `-mno-incremental-linker-compatible` switch translates to Brepro
linker flag and must be passed on to the underlying linker to match the
behavior of the Windows triples that produce PE COFF.