The NYI diagnostic in getFunctionTypeForVTable showed up a few times in
testing, so this patch is attempting to fix that up.
The reproducer here is a function type for a vtable that has an
incomplete type in it(return or parameter). Classic codegen chooses to
represent this as an opaque type.
This patch instead removes the special v-table handling here, so that we
can instead just represent the types as incomplete record types.
At the moment, this patch ends up lowering incomplete types as 'empty'
types in LLVM-IR, which we may find we need to modify in the future,
however at the moment, it seems to work.
This patch ALSO changes the definition of RecordType::isSized to only be
true for complete types, which prevents a number of other things from
attempting to add attributes/check the size of the type/etc, but those
are irrelevant for the purposes of vtable emission.
This is a pretty simple one, its just a type of decl-context. The actual
exporty-ness is handled on a per-declaration basis.
This patch just makes sure we emit them, as I suspect this will reveal
quite a bit more issues in module code I suspect.
When using `-Xclang -dump-tokens`, the lexer dump output is currently
difficult to read because the data are misaligned. The existing
implementation simply separates the token name, spelling, flags, and
location using `'\t'`, which results in inconsistent spacing.
For example, the current output looks like this on provided in this
patch example **(BEFORE THIS PR)**:
<img width="2936" height="632" alt="image"
src="https://github.com/user-attachments/assets/ad893958-6d57-4a76-8838-7fc56e37e6a7"
/>
# Changes
This small PR improves the readability of the token dump by:
+ Adding padding after the token name and after the spelling (the
padding amount was chosen empirically to produce good average
alignment).
+ Swapping the order of location and flags (since flags can take up a
lot of space and disrupt alignment).
The result is a more readable output **(AFTER THIS PR)**:
<img width="1470" height="315" alt="image"
src="https://github.com/user-attachments/assets/c24f24e5-a431-42cc-b5b6-232bac5c635e"
/>
Propagate demanded bits through readfirstlane intrinsic in
AMDGPUISelLowering with SimplifyDemandedBitsForTargetNode
implementation.
This allows upstream zero/sign extensions to be eliminated when only a
subset of bits is used after the intrinsic.
Partially addresses #128390.
* Correctly return the result when used from the console, so that
`DiagnosticsRendering` could use it to output the error.
* Add location pointer to `DILDiagnosticError` internal formatting to
show diagnostics when called from the API.
`generateInitialTensorForPartialReduction` and the `getInitSliceInfo*`
helpers unconditionally cast every result expression of the partial
result AffineMap to `AffineDimExpr`. When the original output indexing
map contains a constant (e.g. `affine_map<(d0,d1,d2)->(d0,0,d2)>`), the
constant expression propagates into the partial map and the cast
triggers an assertion.
Fixes#173025
Assisted-by: Claude Code
The affine-loop-invariant-code-motion pass was hoisting side-effectful
operations (e.g. affine.store) out of loops whose trip count is
statically known to be zero. This caused stores to execute
unconditionally even though the loop body should never run, producing
incorrect results.
The fix skips hoisting of non-memory-effect-free ops when
getConstantTripCount returns 0. Pure/side-effect-free ops are still
eligible for hoisting because they cannot change observable program
state.
Fixes#128273
Assisted-by: Claude Code
This patch completely removes `isLoopCarriedDep`, which was used
previously to identify loop-carried dependencies in the DAG. Now that we
have the DDG representation, this special handling is no longer
necessary. Simply replacing its usage with the DDG causes several tests
to fail, since cycle detection takes some of the validation-only edges
in the DDG into account. To address this, this patch introduces extra
edges in the DDG, which are used only for cycle detection and not for
other parts of the pass (e.g., scheduling). The extra edges are
determined to preserve the existing behavior of the pass as closely as
possible, which makes the predicates for adding them somewhat complex.
Split off from #135148, and the final patch in the series for #135148
The generic MemRefRewritePattern handles AllocOp/AllocaOp by calling
getFlattenMemrefAndOffset with the op's own result as the source memref.
This inserts ExtractStridedMetadataOp and ReinterpretCastOp that consume
op.result before the alloc op itself in the block. After
replaceOpWithNewOp, op.result is RAUW'd to the new ReinterpretCastOp
result, leaving those earlier ops with forward references — a domination
violation caught by MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS.
Replace the AllocOp/AllocaOp cases in MemRefRewritePattern with a
dedicated AllocLikeFlattenPattern that never touches op.result until the
final replaceOpWithNewOp:
- sizes come from op.getMixedSizes() (operands, not the result)
- strides come from getStridesAndOffset on the MemRefType
- the flat allocation size is computed via
getLinearizedMemRefOffsetAndSize plus the static base offset so the
buffer covers [0, offset+extent)
- castAllocResult is simplified to take the pre-computed sizes and
strides rather than inserting an ExtractStridedMetadataOp on the
original op
- non-zero static base offsets are now correctly preserved in the
reinterpret_cast (the old code hardcoded offset=0, which was a verifier
error for layouts with offset \!= 0)
- dynamic offsets or strides bail out via notifyMatchFailure
Also remove the now-dead AllocOp/AllocaOp branches from replaceOp() and
the constexpr specialisation in getIndices().
Assisted-by: Claude Code
`mlir::affine::simplifyConstrainedMinMaxOp` called
`canonicalizeMapAndOperands` with `newOperands` that could contain null
`Value()`s. These nulls came from
`unpackOptionalValues(constraints.getMaybeValues(), newOperands)` where
internal constraint variables added by `appendDimVar` (for `dimOp`,
`dimOpBound`, and `resultDimStart*`) have no associated SSA values.
Passing null Values to `canonicalizeMapAndOperands` risks undefined
behavior:
- `seenDims.find(null_value)` in the DenseMap causes all null operands
to collide at the same key, producing incorrect dim remapping.
- Any null operand that remains referenced in the result map would
propagate as a null Value into `AffineValueMap`, crashing callers that
try to use those operands to create ops.
Fix: Before calling `canonicalizeMapAndOperands`, filter null operands
from `newOperands` by replacing their dim/symbol positions in `newMap`
with constant 0 (safe because internal constraint dims should not appear
in the bound map expression) and compacting `newOperands` to contain
only non-null Values.
Fixes#127436
Assisted-by: Claude Code
The "small range with constant divisor" optimization in
`inferAffineExpr` for `AffineExprKind::Mod` assumed that if the dividend
range span (`lhsMax - lhsMin`) is less than the divisor, then the mod
results form a contiguous range. This is not always true, as the range
can straddle a modulus boundary.
For example, `[14, 17] mod 8`:
- Span is 3 < 8, so the old condition passed
- But `14%8=6` and `17%8=1` (wraps at 16)
- `umin=6, umax=1` → assertion `umin.ule(umax)` fails
The fix adds a same-quotient check (`lhsMin/rhs == lhsMax/rhs`) to
ensure both endpoints fall within the same modular period. When they
don't, we fall back to the conservative `[0, divisor-1]` range.
Assisted-by: Cursor (Claude)
Signed-off-by: Yu-Zhewen <zhewenyu@amd.com>
This commit removes the class `SwitchNodeBuilder` because it just
obscured the logic of switch handling by hiding some parts of it in
another source file.
This patch is motivated by
https://github.com/llvm/llvm-project/pull/189943, where we would like to
print the "these module scripts weren't loaded" warning for *all*
modules batched together. I.e., we want to print the warning *after* all
the script loading attempts, not from within each attempt.
To do so we want to hoist the `ReportWarning` calls in
`Module::LoadScriptingResourceInTarget` out into the callsites. But if
we do that, the callers have to remember to print the warnings. To avoid
this, we redirect all callsites to use
`ModuleList::LoadScriptingResourceInTarget`, which will be responsible
for printing the warnings.
To avoid future accidental uses of
`Module::LoadScriptingResourceInTarget` I moved the API into
`ModuleList` and made it `private`.
This PR aims to make the pass more generic by removing the ModuleOp
restriction. This PR reimplements the logic using a standalone
PassManager. Additionally, the isInteresting method has been updated to
accept Operation* for better flexibility. Finally, a dedicated test
directory has been added to improve the organization of OptReductionPass
tests.
addAliasScopeMetadata in AMDGPULowerKernelArguments skips instructions
with empty PtrArgs, including memory-accessing calls that have no
pointer arguments (e.g. builtins like threadIdx()). Because these calls
never receive !noalias metadata, ScopedNoAliasAA cannot prove they don't
alias noalias kernel arguments. MemorySSA then conservatively reports
them as clobbers, which prevents AMDGPUAnnotateUniformValues from
marking loads as noclobber, blocking scalarization (s_load) and forcing
expensive vector loads (global_load) instead.
Fix by adding all noalias kernel argument scopes to !noalias metadata
for memory-accessing instructions with no pointer arguments. Since such
instructions cannot access memory through any kernel pointer argument,
all noalias scopes are safe to apply.
This fixes a performance regression in rocFFT introduced by bd9668df0f00
("[AMDGPU] Propagate alias information in AMDGPULowerKernelArguments").
Assisted-by: Claude Opus
The `bool serial` condition in scanRelocations disabled parallelism for
three cases: -z nocombreloc, MIPS, and PPC64. Resolve two cases:
- nocombreloc: .rela.dyn is now always created with combreloc=true so
non-relative relocations are sorted deterministically. Since
#187964 already separates relative relocations unconditionally,
the only remaining effect of -z nocombreloc is suppressing
DT_RELACOUNT (gated on ctx.arg.zCombreloc in DynamicSection).
- PPC64: After #181496 moved scanning into scanSectionImpl, the
sole thread-unsafe access is ctx.ppc64noTocRelax (DenseSet::insert).
Protect it with ctx.relocMutex, which is already used for rare
operations during parallel scanning.
MIPS retains serial scanning due to `MipsGotSection` mutations.
This test doesn't work as intended when an alternative default linker is
specified via `-DCLANG_DEFAULT_LINKER=ld`. If this test isn't intended
to support alternate default linker, lmk I can just change the
downstream usage I'm seeing, though I figure other folks may have
similar configurations. Repro:
```
cmake -S llvm -B build -DLLVM_ENABLE_PROJECTS="clang" -DCLANG_DEFAULT_LINKER=ld -GNinja
ninja -C build
./build/bin/llvm-lit -v clang/test/DebugInfo/CXX/hotpatch.cpp
...
possible intended match
# | 6: "/usr/bin/ld" "-out:hotpatch.exe" "-libpath:lib/amd64" "-libpath:atlmfc/lib/amd64" "-nologo" "-functionpadmin" "/tmp/lit-tmp-o7x0r1o_/hotpatch-4595de.obj"
```
afaict it passed before because `-mincremental-linker-compatible` was
being used until e97a42d5f9fe51de50aabd4d9bf6874a4955f9fa, which would
match on the compilation line.
`__arm_agnostic("sme_za_state")` does not require +sme, but we must
still preserve ZA in case the function is used with code that makes use
of ZA:
> The use of `__arm_agnostic("sme_za_state")` allows writing functions
> that are compatible with ZA state without having to share ZA state
> with the caller, as required by `__arm_preserves`. The use of this
> attribute does not imply that SME is available.
A kernel developer noticed that I missed a call to index the local
filesystem in one of our codepaths, and had a use case that depended on
that working.
rdar://173814556
In some situations such as reported at
https://github.com/llvm/llvm-project/pull/177953#issuecomment-4179014239,
LLVM_(DEFAULT_)TARGET_TRIPLE is not set. It is used to derive the output
directory in #177953. Only flang-rt currently uses
RUNTIMES_(INSTALL|OUTPUT)_RESOURCE_LIB_PATH, we should not fail building
other despite a missing LLVM_TARGET_TRIPLE.
Compiler-rt uses COMPILER_RT_DEFAULT_TARGET_TRIPLE instead which it
derives itself. Most other LLVM runtimes libraries just skip the target
portion of the library path (explicitly so since #93354). Do the same
for RUNTIMES_(INSTALL|OUTPUT)_RESOURCE_LIB_PATH which we hope eventually
can replace the other mechanisms.
…(#135079)"
This reverts commit a757f23404c594f4a48b4ddb6625f88b349d11d5. Commit
causes reduce.cu file in hipcub/warp go from 2 minutes of compilation to
taking several hours.
Many tests have ad hoc forms of the launch & break steps done by
`lldbutil.run_to_source_breakpoint`. This changes some of those tests to
use `run_to_source_breakpoint` instead.
Assisted-by: claude
We had a bug where exceptions caught with catch-all were not properly
handling a thrown exception if the catch-all handler enclosed a cleanup
handler. The structured CIR was generated correctly, but when we
flattened the CFG and introduced cir.eh.initiate operations, the
cir.eh.initiate for the cleanup's EH path was incorrectly marked as
cleanup-only, even though it chained to the dispatch for the catch-all
handler. This resulted in the landing pad generated for the cleanup not
being marked as having a catch-all handler, so the exception was not
caught.
This change fixes the problem in the FlattenCFG pass.
Assisted-by: Cursor / claude-4.6-opus-high
This PR adds the TableGen-generated headers from
https://github.com/llvm/llvm-project/pull/187610 to the HLSL
distribution.
Currently the HLSL distribution is incomplete due to missing these
generated headers, preventing successful compilation:
```
Command Output (stderr):
--
In file included from <built-in>:1:
In file included from D:\a\_work\1\ClangHLSL\Binaries\lib\clang\23\include\hlsl.h:24:
D:\a\_work\1\ClangHLSL\Binaries\lib\clang\23\include\hlsl/hlsl_alias_intrinsics.h:42:10: fatal error: 'hlsl_alias_intrinsics_gen.inc' file not found
42 | #include "hlsl_alias_intrinsics_gen.inc"
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
```
This PR fixes the error by including `hlsl_alias_intrinsics_gen.inc` and
`hlsl_inline_intrinsics_gen.inc` in the HLSL distribution.
This allows a build system to direct Clang to prune a module cache
directory using the same method Clang does internally.
This also changes `clang::maybePruneImpl` to clean up files directly in
the directory, not just subdirectories.
This moves the LLVM_LIBC_IS_DEFINED macro to its own header is
__support/macros. Its implementation leverages cpp::string_view
instead of rolling its own strcmp; this necessitated fixing
several missing constexpr in the string_view implementation.
The new __support/macros/macro-utils.h is also broken out to hold
the stringification macro and can be used in future for token
pasting shenanigans and other such generic macro machinery.