Currently, `-DLLVM_DISTRIBUTION_COMPONENTS="flang-module-interfaces"`
doesn't work. It failed to build the Fortran builtin/intrinsic modules
as distribution build, `install-distribution`.
This PR is to fix that.
Hexagon clang recently started to define __HVX_IEEE_FP__ when the
-mhvx-ieee-fp option is specified. Guard the intrinsic macros for
instructions that should only be available with -mhvx-ieee-fp with
__HVX_IEEE_FP__.
Additionally, the following NFC changes are included:
- NFC: Remove guards around HVX v60 intrinsic macros
Hexagon v60 is the oldest Hexagon version that supports HVX so these
guards were redundant. Presence of HVX is guarded separately, once
per the whole file.
- Remove comments from closing guards (HVX protos)
These comments served very limited function as they only guard
one macro. Also, they were incorrect. Instead of fixing remove them.
This will also reduce by the factor of two the amount of changes
when guarding conditions change.
This change fixes incorrect implicit declare mapper behavior in Flang
OpenMP lowering.
Issue:
Implicit default mappers were being attached/generated for pointer-based
implicit captures, and also on data-motion directives. That could
trigger recursive component mapping that overlaps/conflicts with
explicit user mappings, causing runtime mapping failures.
Fix:
- Skip implicit default mapper generation for implicit pointer captures
(keep support for allocatables).
- Do not auto-attach implicit mappers on target enter data, target exit
data, or target update.
- Apply the same pointer guard in the implicit target-capture lowering
path.
The TDM base creation (amdgpu.make_tdm_base and
amdgpu.make_gather_tdm_base) take references to a
`%memref[%i0, %i1,, ...]` for the starting point of the tiles in
global/shared memory that the TDM descriptor refers to. Memory alias ops
can be safely folded into these operations, since these two memref
operands are just pointers to a scalar starting pint and don't have
semantics that depend on the memref layout (except to the extent that it
defines a location in memory).
While I'm here, I've cleaned up a few things, like the incorrect file
header and fixed the tests to not use integer address spaces.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Closes: https://github.com/llvm/llvm-project/issues/174912
When generating a `mulx` instruction for a widening multiplication, even
if one input is placed in %rdx, LLVM won't place it in the implicit
first slot, instead it'll generate two movs before calling mulx to swap
the registers, which are unnecessary. GCC already has this optimization
(as shown in the issue) so this puts the two compilers closer to each
other on that front.
Co-authored-by: Aiden Grossman <aidengrossman@google.com>
FuncOp::verify() iterated over all blocks and called
getMutableSuccessorOperands() on any RegionBranchTerminatorOpInterface
terminator to check return types. This ran during the entrance phase of
verification — before child ops had been verified — so a malformed
terminator whose getMutableSuccessorOperands() assumed invariants
established by its own verify() could crash instead of emitting a clean
diagnostic.
Fix by switching to hasRegionVerifier=1: rename verify() →
verifyRegions() so the return-type checks run in the exit phase, after
all nested ops have already been verified.
To demonstrate the bug and guard against regression, add
TestCrashingReturnOp to the test dialect. The op implements
RegionBranchTerminatorOpInterface and report_fatal_errors in
getMutableSuccessorOperands() when its 'valid' unit-attr is absent,
reproducing the class of crash described above. The accompanying lit
test confirms a clean diagnostic is emitted rather than a crash.
Right now we have a problem where if you have a LLVM module with globals
but no functions, a completely empty SPIR-V module is emitted.
This is because global emission is dependent on tracking intrinsic
functions being emitted in functions.
As a simple fix, just insert a service function, which the backend is
already set up to not actually emit, if there are no real functions.
The current use case of the service function is for function pointers. I
don't think it's possible that we need to both generate a service
function for function pointers and for globals with no functions, so I
just added an error (not an assert) just in case if we do need it for
both cases.
Probably we should rework global handling in the future to work without
these workarounds, but this is a pretty fundamental issue so let's work
around it with this simple change for now.
This change exposed an existing bug:
We consider basic blocks with no successors as fall-through
Also, fix some existing tests. The symptom was:
We previously emitted an empty module, but not that we don't, we hit a
`spirv-val` error about invalid Function StorageClass for globals
because no `addrspace` was specified. Set the `addrspace` to `1`
(`CrossWorkgroup`) in those tests.
Closes: https://github.com/llvm/llvm-project/issues/182899
---------
Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
In verifyOperationAttribute(), the single-symbol path for shape.lib used
SymbolTable::lookupSymbolIn() followed by an explicit null check. The
array path at line 196-197 used dyn_cast<FunctionLibraryOp>() directly
on the lookup result, which asserts when the symbol is not found (null
pointer).
Fix: use dyn_cast_or_null<> instead of dyn_cast<> so that a missing
symbol falls through to the existing "does not refer to
FunctionLibraryOp" error diagnostic instead of asserting.
Fixes#159653
The test utility function `testVecAffineLoopNest` called
`isLoopParallel` with a `reductions` output parameter, which populates
reduction descriptors when the loop performs a reduction. However, these
descriptors were never added to `strategy.reductionLoops` before calling
`vectorizeAffineLoopNest`. When the vectorizer then processed a loop
with `iter_args`, it found no reduction descriptors in the strategy and
hit an assertion failure.
Fix by registering the reduction loop descriptors in the strategy before
vectorization, matching what the production vectorizer code already does
correctly.
Fixes#128334
Fixes https://github.com/llvm/llvm-project/issues/180155.
This is a duplicate of https://github.com/llvm/llvm-project/pull/180700
except that I also added some tests, fine to go with either PR, but we
should add the tests.
peekNextPPToken lexed a token and mutated MIOpt, which could clear the
controlling-macro state for main files in C++20 modules mode.
Save/restore MIOpt in Lexer::peekNextPPToken.
Add regression coverage in
LexerTest.MainFileHeaderGuardedWithCPlusPlusModules that checks to make
sure the controlling macro is properly set in C++20 mode.
Add source level lit test in miopt-peek-restore-header-guard.cpp that
checks to make sure that the warnings that depend on the MIOpt state
machine are emitted in C++20 mode.
Fixes#184178
The optin.cplusplus.VirtualCall checker reports warnings for virtual
method calls during construction/destruction even when the call site is
in a system header (included via -isystem). Users cannot fix such code
and must resort to NOLINT suppressions.
Add a system header check in checkPreCall before emitting the report,
consistent with how other checkers (e.g. MallocChecker) handle this.
The function is used to extend a `bool` (vector or scalar) into `1/-1`
for `true` and `0` for `false` (vector or scalar).
There is no obvious "default" argument for a select operation, so the
original name is confusing.
This patch:
* Renames this function to better signal its intention,
* makes the boolean argument explicit in the function (instead of
implicit through the first register operand of the instruction),
* rename `I` to `InsertAt`.
Artificial registers were added in
eb0c510ecde667cd911682cc1e855f73f341d134
as a means of giving super-registers heavier weights than that
of their subregisters, even when they only contain a single
physical subregister.
Artifical registers thus do exist in code and participate in
register unit weight calculations, but are not supposed to be
available for register allocation.
This patch completes the support for artificial registers to:
- Ignore artificial registers when joining register unit uber
sets. Artificial registers may be members of classes that
together include registers and their sub-registers, making it
impossible to compute normalised weights for uber sets they
belong to.
We have a use case downstream relying on this being supported,
which allows to avoid introducing a large number of additional
register classes.
- Not generate purely artificial register class intersections.
It is critical not to have such classes, as the common LLVM
codegen infrastructure will try to use them to constrain
classes of virtual registers instead of producing COPYs
whenever both the source and target register classes contain
the same artificial registers.
- Not generate sub-classes where classes with the same
non-artificial members already exist. This is mostly for
convenience. For example, the HI16-capable subset of AMDGPU's
AV_32 is VGPR_32, except VGPR_32 also contains the artificial
staging registers. If the staging registers are not ignored,
we'll end up having an additional generated register class,
AV_32_with_hi16_in_VGPR_16, -- harmless, but also useless.
Eliminates a few inferred AMDGPU register classes:
- VS_32_with_hi16
- VS_32_Lo256_with_hi16
- VS_32_Lo128_with_hi16
- VRegOrLds_32_and_VS_32_Lo256
- VRegOrLds_32_and_VS_32_Lo128
- SRegOrLds_32_and_VRegOrLds_32
Causes no register class changes for other targets.
TypeOp::verify() and AttributeOp::verify() called StringRef::front() to
check for leading '\!' or '#' sigils before passing the name to
isValidName(). When sym_name is empty, front() triggers an assertion
failure:
Assertion `\!empty()' failed.
Fix: guard the front() calls with an emptiness check. An empty sym_name
then falls through to isValidName(), which already emits a proper
diagnostic:
error: name of type is empty
Fixes#159949
Constraint lambdas in the requires body need complete template arguments
before they can be evaluated. That was connected by
ImplicitConceptSpecializationDecl which is no longer created naturally
after the normalization patch.
This patch fixes the bug by creating a temporary decl for that purpose.
Though the temporary object should go away once we have the evaluation
context track template arguments.
No release note for being a regression fix.
Fixes#184047
The attributes `exclude_from_explicit_instantiation` and
`dllexport`/`dllimport` serve opposite purposes.
Therefore, if an entity has both attributes, drop one with a warning,
depending on the context of the declaration.
In a template context, the `exclude_from_explicit_instantiation`
attribute takes precedence over the `dllexport` or `dllimport`
attribute. Conversely, the `dllexport` and `dllimport` attributes are
prioritized, in a non-template context.
A variable with an unspecified data-sharing attribute under a
DEFAULT(NONE) clause only emits an error if the variable is explicitly
referenced in the body of the construct with DEFAULT(NONE).
Ex:
```
!$omp parallel default(none)
!$omp task
a = 1
!$omp end task
!$omp end parallel
end
```
gfortran will error with `‘a’ not specified in enclosing ‘parallel’` on
the above. flang doesn't error.
Fix moves the error check to `CreateImplicitSymbols` and checks the
variable for a violation in any of its enclosing contexts.
When SROA runs on an alloca of an empty struct type (llvm.struct<()>),
it crashes with:
Assertion `\!subelementIndexMap->empty()' failed.
The root cause is in LLVMStructType::getSubelementIndexMap(): for an
empty struct (no body fields), the loop doesn't execute and an empty
DenseMap is returned as a non-null optional. Later, getTypeAtIndex()
asserts the map is non-empty, triggering the crash.
Fix this by returning std::nullopt for empty structs, indicating they
cannot be destructured. This is consistent with how LLVMArrayType
handles the zero-element case.
Fixes#108366
Reverts llvm/llvm-project#184234
This is breaking SPEC and other tests.
Reproducer:
```
subroutine foo()
logical :: l1, l2
do while (l1())
if (l2()) then
call bar()
endif
enddo
end
```
The cause is a pass ordering issue between the SCFToControlFlowPass and
CfgConversionPass
[here](d0f50d5574/flang/lib/Optimizer/Passes/Pipelines.cpp (L239-L240)).
I think they need to be run simultaneously somehow because the both SCF
and FIR structured operations may contain each other, and none will be
happy to get block CFG generated inside their region by the pass
lowering the other.
Reverting while this is sorted out.
When verifying return-like terminators, use
getMutableSuccessorOperands() instead of getNumOperands() so that only
the operands passed to the parent region are checked against the
function result types. This handles terminators that implement
RegionBranchTerminatorOpInterface and carry additional operands for
other successor regions (e.g. loop back-edges).
Add tests using test.loop_block_term, which has both an iter operand
(passed back to the region) and an exit operand (passed to the parent).
Add ODS type constraints that exclude zero-bitwidth integers (i0) from
operations in the arith and vector dialects. i0 has no meaningful
arithmetic representation and operations on it can trigger undefined
behavior (e.g. bitwidth calculations assuming non-zero width).
Changes:
- Add `AnyNonZeroBitwidthSignlessInteger` (as a `ConfinedType` over
`AnySignlessInteger`) and `AnyNonZeroBitwidthSignlessIntegerOrIndex`
to CommonTypeConstraints.td.
- Introduce `Arith_SignlessIntegerOrIndexLike` in ArithOps.td that wraps
`AnyNonZeroBitwidthSignlessIntegerOrIndex` via
`TypeOrValueSemanticsContainer`, and update
`SignlessFixedWidthIntegerLike`
to use `AnyNonZeroBitwidthSignlessInteger`. Replace all uses of the
shared `SignlessIntegerOrIndexLike` in ArithOps.td with the new
dialect-local constraint.
- Update `IndexCastTypeConstraint` to use
`Arith_SignlessIntegerOrIndexLike`.
- Update `BitcastTypeConstraint` to exclude i0 by composing the already-
defined `SignlessFixedWidthIntegerLike` and `FloatLike` constraints,
keeping the definition compact (3 alternatives instead of 7).
- Add `AnyVectorOfNonI0Elem` and `AnyVectorOfNonZeroRankNonI0Elem` in
VectorOps.td and apply them to `vector.contract`, `vector.reduction`,
`vector.multi_reduction`, `vector.outerproduct`, `vector.bitcast`, and
`vector.scan`.
- Update arith/invalid.mlir with explicit i0 rejection tests covering
all
integer op families (binary ops, cast ops, extended-multiply ops, cmpi,
bitcast, index_cast, index_castui) for both scalar and vector<N> forms.
- Update vector/invalid.mlir with i0 rejection tests for all covered
ops.
- Remove the now-invalid i0 canonicalization tests from
arith/canonicalize.mlir.
Fixes#177822Fixes#179266Fixes#180463Fixes#181532
See also
https://discourse.llvm.org/t/rfc-reject-i0-integer-type-in-arith-and-vector-ops/90011
The `--nvgpu-optimize-shared-memory` pass crashed when processing
memrefs with vector element types (e.g., `memref<16x1xvector<16xf16>,
3>`). This occurred because getElementTypeBitWidth() calls
getIntOrFloatBitWidth(), which asserts the element type must be an
integer or float.
Thus, this PR adds an early-exit guard to return failure() when the
memref's element type is not a scalar int or float.
I wasn't sure if we should support vector types (by multiplying element
bit width by vector length) or just reject them. For now, I've
implemented it to return failure on non-scalar types.
Fixes#177823
Co-authored-by: rebel-jueonpark <jueonpark@rebellions.ai>
When a scf.while op has a loop-carried value whose type converts to
emitc::ArrayType (e.g. memref<1xf64>), the WhileLowering pattern
unconditionally called emitc::LValueType::get(arrayType), which
triggered an assertion because LValueType cannot wrap an array type.
Fix by returning a match failure in createVariablesForResults and
createVariablesForLoopCarriedValues when the converted type is an
emitc::ArrayType. This converts the crash into a proper legalization
failure.
Fixes#182649
Add a canonicalization pattern that replaces block arguments with a
common SSA value when all predecessors pass the same value for that
argument. This allows the block argument to be removed by dead code
elimination. First itteration
Idea from #182711
As of version 19.15 (Visual Studio 2017 version 15.8), MSVC predefines
the `_MSVC_TRADITIONAL` macro to indicate whether it is using the old
"traditional" preprocessor or the new standards-conforming preprocessor.
Clang now predefines `_MSVC_TRADITIONAL` as 1 when emulating MSVC 19.15
or later, since Clang supports most traditional preprocessor behaviors
(e.g. `/##/` turning into `//`) when running in MSVC compatibility mode.
Currently there isn't a situation where it makes sense for Clang to
report `_MSVC_TRADITIONAL` as 0, since MSVC compatibility mode only
attempts to be compatible with the traditional MSVC preprocessor.
However, this does mean that clang-cl cannot match MSVC's behavior of
implicitly enabling the conforming C preprocessor when compiling with
`/std:c11`, `/std:c17`, or `/std:clatest`.
Fixes#47114
When the message includes a final newline, Formatv can add that for you.
The only unusual change is one place in platform where we need to print
octal. LLVM doesn't have a built in way to do this (see
llvm/include/llvm/Support/FormatProviders.h) and this is probably the
only place in the codebase that wants to. So I decided not to add it
there.
Instead I've put the number info a format adapter with the normal printf
specifier, then put that into the Formatv format.
Makes it possible to include Python-defined rewrite patterns in
transform-dialect schedules, inside of `transform.apply_patterns`, which
upon execution of the schedule runs the pattern in a greedy rewriter.
With assistance of Claude.
Instead of using memory buffers without file backing, this patch
`input_line_N` buffers as virtual files.
This patch enables us to use input line numbers when verifying tests
`clang-repl`.
Co-authored-by: Vassil Vassilev <v.g.vassilev@gmail.com>