This reverts commit 27539c3f903be26c487703943d3c27d45d4542b2. Retry
with new buildbot configuration after master restart.
Original message:
Remove the FLANG_INCLUDE_RUNTIME option which was replaced by
LLVM_ENABLE_RUNTIMES=flang-rt.
The FLANG_INCLUDE_RUNTIME option was added in #122336 which disables the
non-runtimes build instructions for the Flang runtime so they do not
conflict with the LLVM_ENABLE_RUNTIMES=flang-rt option added in #110217.
In order to not maintain multiple build instructions for the same thing,
this PR completely removes the old build instructions (effectively
forcing FLANG_INCLUDE_RUNTIME=OFF).
As per discussion in
https://discourse.llvm.org/t/buildbot-changes-with-llvm-enable-runtimes-flang-rt/83571/2
we now implicitly add LLVM_ENABLE_RUNTIMES=flang-rt whenever Flang is
compiled in a bootstrapping (non-standalone) build. Because it is
possible to build Flang-RT separately, this behavior can be disabled
using `-DFLANG_ENABLE_FLANG_RT=OFF`. Also see the discussion an
implicitly adding runtimes/projects in #123964.
This patch,
- Added a new attribute `nontemporal` to fir.load and fir.store operation in the FIR dialect.
- Added a pass `lower-nontemporal` which is called before FIRToLLVM conversion pass and adds the nontemporal attribute to loads and stores on the list items specified in the nontemporal clause of the SIMD directive.
- Set the `UnitAttr:$nontemporal` to llvm.load and llvm.store operations during FIR to LLVM dialect conversion, if the corresponding fir.load or fir.store operations have the nontemporal attribute.
- Attached the `nontemporal metadata` to load and store instructions that have the nontemporal attribute, during LLVM dialect to LLVM IR translation.
Without this change I get a build error due to the missing Bye target
when I configure my build with -DLLVM_INCLUDE_EXAMPLES=OFF.
This check for LLVM_BUILD_EXAMPLES matches the checks in llvm and lld.
Reviewed By: mgorny
Pull Request: https://github.com/llvm/llvm-project/pull/137908
When privatizing allocatable/pointer arrays, the code was creating a
temporary but this was a box type. This led to inconsistency between the
input and output of recipe.
The updated logic now creates storage when a box ref is requested.
This patch adds support to translate `firstprivate` clauses on `omp.target` ops when translating from MLIR to LLVMIR.
Presently, this PR is restricted to supporting only included tasks, i.e `#omp target nowait firstprivate(some_variable)` will likely not work correctly even if it produces object code.
This patch produces the following TBAA tree for a function:
```
Function root
|
"any access"
|
|- "descriptor member"
|- "any data access"
|
|- "dummy arg data"
|- "target data"
|
|- "allocated data"
|- "direct data"
|- "global data"
```
The TBAA tags are assigned using the following logic:
* All POINTER variables point to the root of "target data".
* Dummy arguments without POINTER/TARGET point to their
leafs under "dummy arg data".
* Dummy arguments with TARGET point to the root of "target data".
* Global variables without descriptors point to their leafs under
"global data" (including the ones with TARGET).
* Global variables with descriptors point to their leafs under
"direct data" (including the ones with TARGET).
* Locally allocated variables point to their leafs under
"allocated data" (including the ones with TARGET).
This change makes it possible to disambiguate globals like:
```
module data
real, allocatable :: a(:)
real, allocatable, target :: b(:)
end
```
Indeed, two direct references to global vars cannot alias
even if any/both of them have TARGET attribute.
In addition, the dummy arguments without POINTER/TARGET cannot alias
any other variable even with POINTER/TARGET. This was not expressed
in TBAA before this change.
As before, any "unknown" memory references (such as with Indirect
source, as classified by FIR alias analysis) may alias with
anything, as long as they point to the root of "any access".
Please think of the counterexamples for which this structure
may not work.
`addArchSpecificRPath` should immediately return for AIX as AIX doesn't
support `rpath` option.
`getArchSpecificLibPaths` should return as well as we don't want
`-L/ArchSepcificLibPaths` sent to the linker on AIX.
Add following semantic checks for ALLOCATE directive as per OpenMP 6.0
standard.
- List item in ALLOCATE directive must not be a dummy argument
- List item in ALLOCATE directive must not have POINTER attribute
- List item in ALLOCATE directive must not be a associate name
The AArch64 procedure call standard does not mandate that the callee
extends the return value. Clang does not add signext to functions
returning i8 or i16 on linux aarch64, but flang does.
This means that runtime routines returning i8's will have signext on the
callsite/declaration, but not on the implementation, and the call site
will assume the return value has already been sign extended when it has
not. This showed up in a test case calling MINVAL on an array of
INTEGER*1.
Adjust our integer extension flags to match clang and aarch64pcs on
linux. The behavior on Darwin should be preserved. This is listed on the
apple developer guide as a divergence from aarch64pcs.
in #136610 we agreed that all async clauses on compute constructs should
act as 'only 1 per device-type-group'. On `data`, it has the same
specification language, and the same real requirements, so it seems
sensible to make it work the same way.
The OmpAtomicClause is a variant of a few specific clauses that are used
on the ATOMIC construct. The HINT clause, however, was represented as a
generic OmpClause, which somewhat complicated the analysis of an
OmpAtomicClause.
Introduce OmpHintClause to represent the contents of the HINT clause,
and use it on OmpAtomicClause similarly to how OmpFailClause is used.
This PR adds support to be able to generate `acc.bounds` operation
through `MappableType`'s `generateAccBounds` when there is no fir.box
entity. This is especially useful because the FIR type does not capture
size information for explicit-shape arrays and current implementation
relied on finding the box entity.
This scenario is possible because during HLFIRtoFIR, `fir.array_coor`
and `fir.box_addr` operations are often optimized to use raw address. If
one tries to map the ssa value that represents such a variable, correct
dimensions need extracted from the shape information held in the fir
declare operation.
Previously the unparser would print like
```
!$OMP CANCEL CANCELLATION_CONSTRUCT_TYPE(SECTIONS)
```
This is not valid Fortran. I have fixed it to print without the clause
name.
The acc.bounds operation allows specifying stride - but it did not
clarify what it meant. The dialect was updated to specifically note that
stride must capture inner dimension sizes when specified for outer
dimensions.
Flang lowering was also updated for OpenACC to adhere to this. This was
already the case for descriptor-based arrays - but now this is also
being done for all arrays.
The `async` clause was not handed in a similar way on `serial`,
`parallel` and `kernels` directive. This patches updates the `ACC.td`
file and the flang semantic to make it homogenous.
The nesting of fir.dummy_scope operations defines the roots
of the TBAA forest. If we do not generate fir.dummy_scope
in functions that do not have any dummy arguments, then
the globals accessed in the function and the dummy arguments
accessed by the callee may end up in different sub-trees
of the same root. The added tbaa-with-dummy-scope2.fir
demonstrates the issue.
See new test. I inadvertently broke this behavior with a recent fix for
another problem, because the effects of the overloaded
TokenSequence::Put() member function on token merging were confusing.
Rename and document the various overloads.
… discontiguity
For dummy assumed-shape/-rank device arrays, test the associated actual
argument for stride-1 contiguity, and report an error when the actual
argument is known to not be stride-1 contiguous and nonempty, or a
warning when when the actual argument is not known to be empty or
stride-1 contiguous.
The current parser can fail on "self(x * 2)" by recognizing just "x" as
a one-element list of object names and then failing at a higher level
because it never reached the right parenthesis. Add lookahead checks and
error recovery.
Fixes https://github.com/llvm/llvm-project/issues/135810.
The output of a compilation with the -fdebug-unparse-with-modules option
comprises its normal unparsed output along with the regenerated contents
of any modules that were required from module files. This is handy for
producing stand-alone test cases.
The modules' contents are generated by the same code that writes module
files, so they can contain some USE associations to private entities in
other modules that are necessary to complete local declarations, usually
initializers. Such USE associations to private entities are not flagged
as fatal errors when modules are read from module files, but they
currently are caught when the output produced by this option is being
read back in to the compiler.
Handle this case by softening the error to a warning when one module
uses a private entity from another with an alias containing the
non-conforming '$' character. (I could have omitted the message
altogether, but there are other valid warnings that will occur due to
undefined function result variables; further, I didn't want to provide a
general hole around the protection of private names.)
The present implementation of the intrinsic function SAME_TYPE_AS()
yields false positive .TRUE. results for distinct derived types that
happen to have the same name.
Replace with an implementation that can now depend on derived type
information records being the same type if and only if they are at the
same location, or are PDT instantiations of the same uninstantiated
derived type. And ensure that the derived type information includes
references from instantiated PDTs to their original types. (The derived
type information format supports these references already, but they were
not being set, perhaps because the current faulty SAME_TYPE_AS
implementation didn't need them, and nothing else does.)
Fixes https://github.com/llvm/llvm-project/issues/135580.
This change generalizes SumAsElemental inlining in
SimplifyHLFIRIntrinsics pass so that it can be applied
to ALL, ANY, COUNT, MAXLOC, MAXVAL, MINLOC, MINVAL, SUM.
This change makes the special handling of the reduction
operations in OptimizedBufferization redundant: once HLFIR
operations are inlined, the hlfir.elemental inlining should
do the rest of the job.
In CUDA Fortran the stream is encoded in an INTEGER(cuda_stream_kind)
variable.
This information is carried over the GPU dialect through the
`cuf.stream_cast` and the token in the GPU ops.
When converting the `gpu.launch_func` to runtime call, the
`cuf.stream_cast` becomes a no-op and the reference to the stream is
passed to the runtime.
The runtime is adapted to take integer references instead of value for
stream.
This change generalizes SumAsElemental inlining in
SimplifyHLFIRIntrinsics pass so that it can be applied
to ALL, ANY, COUNT, MAXLOC, MAXVAL, MINLOC, MINVAL, SUM.
This change makes the special handling of the reduction
operations in OptimizedBufferization redundant: once HLFIR
operations are inlined, the hlfir.elemental inlining should
do the rest of the job.
This change generalizes SumAsElemental inlining in
SimplifyHLFIRIntrinsics pass so that it can be applied
to ALL, ANY, COUNT, MAXLOC, MAXVAL, MINLOC, MINVAL, SUM.
This change makes the special handling of the reduction
operations in OptimizedBufferization redundant: once HLFIR
operations are inlined, the hlfir.elemental inlining should
do the rest of the job.
When generating `acc.loop`, the IV was always implicitly privatized.
However, if the user explicitly privatized it, the IR generated wasn't
quite right.
For example:
```
!$acc loop private(i)
do i = 1, n
a(i) = b(i)
end do
```
The IR generated looked like:
```
%65 = acc.private varPtr(%19#0 : !fir.ref<i32>) -> !fir.ref<i32>
{implicit = true, name = "i"}
%66:2 = hlfir.declare %65 {uniq_name = "_QFEi"} : (!fir.ref<i32>) ->
(!fir.ref<i32>, !fir.ref<i32>)
%67 = acc.private varPtr(%66#0 : !fir.ref<i32>) -> !fir.ref<i32>
{name = "i"}
acc.loop private(@privatization_ref_i32 -> %65 : !fir.ref<i32>,
@privatization_ref_i32 -> %67 : !fir.ref<i32>) control(%arg0 : i32) =
(%c1_i32_46 : i32) to (%c10_i32_47 : i32) step (%c1_i32_48 : i32) {
fir.store %arg0 to %66#0 : !fir.ref<i32>
```
In order to fix this, we first process all of the clauses. Then when
attempting to generate implicit private IV, we look for an already
existing data clause operation.
The result is the following IR:
```
%65 = acc.private varPtr(%19#0 : !fir.ref<i32>) -> !fir.ref<i32>
{name = "i"}
%66:2 = hlfir.declare %65 {uniq_name = "_QFEi"} : (!fir.ref<i32>) ->
(!fir.ref<i32>, !fir.ref<i32>)
acc.loop private(@privatization_ref_i32 -> %65 : !fir.ref<i32>)
control(%arg0 : i32) = (%c1_i32_46 : i32) to (%c10_i32_47 : i32) step
(%c1_i32_48 : i32) {
fir.store %arg0 to %66#0 : !fir.ref<i32>
```
Update `cuf.kernel_launch` to take the stream as a reference. Update the
conversion to insert the `cuf.stream_cast` op so the stream can be set
as dependency.
When the mask is scalar, it is incorrect to cast it to
!fir.box<!fir.array<1xlogical<>>>, because the coordinate
operation will try to read the dim-1 stride from the box
to get the address of the first element. Even though
the stride value will be multiplied by 0, and does not matter,
it is still a read past the allocated box object.
Instead, we should just use box_addr to get the address
of the scalar mask.
Cast a stream object reference as a GPU async token. This is useful to
be able to connect the stream representation of CUDA Fortran and the
async mechanism of the GPU dialect.
This op will later become a no op.
NOWAIT was a tricky one because the clause can be on either the start or
the end directive. I couldn't find a convenient way to access the end
directive from the CANCEL directive nested inside of the construct, but
there are convenient ways to access the start directive. I have added a
list to the start directive context containing the clauses from the end
directive.
The TBAA generation gives conservative TBAA metadata when handling an
access of a record type with a descriptor member, since the access may
be a regular data access OR another descriptor. Array members were being
incorrectly identified as non-descriptor-members, and were giving
incorrect TBAA metadata which led to bugs showing up in the optimizer
when LLVM encountered mismatching TBAA.
`fir::isRecordWithDescriptorMember` now unwraps sequence types before
checking for descriptor members.
This reverts commit 04b87e15e40f8857e29ade8321b8b67691545a50.
The reasons for reverting is that the following:
1. I still need need to upstream some part of the do concurrent to
OpenMP pass from our downstream implementation and taking this in
downstream will make things more difficult.
2. I still need to work on a solution for modeling locality specifiers
on `hlfir.do_concurrent` ops. I would prefer to do that and merge the
entire stack together instead of having a partial solution.
After merging the revert I will reopen the origianl PR and keep it
updated against main until I finish the above.
Adds support for lowering `do concurrent` nests from PFT to the new
`fir.do_concurrent` MLIR op as well as its special terminator
`fir.do_concurrent.loop` which models the actual loop nest.
To that end, this PR emits the allocations for the iteration variables
within the block of the `fir.do_concurrent` op and creates a region for
the `fir.do_concurrent.loop` op that accepts arguments equal in number
to the number of the input `do concurrent` iteration ranges.
For example, given the following input:
```fortran
do concurrent(i=1:10, j=11:20)
end do
```
the changes in this PR emit the following MLIR:
```mlir
fir.do_concurrent {
%22 = fir.alloca i32 {bindc_name = "i"}
%23:2 = hlfir.declare %22 {uniq_name = "_QFsub1Ei"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
%24 = fir.alloca i32 {bindc_name = "j"}
%25:2 = hlfir.declare %24 {uniq_name = "_QFsub1Ej"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
fir.do_concurrent.loop (%arg1, %arg2) = (%18, %20) to (%19, %21) step (%c1, %c1_0) {
%26 = fir.convert %arg1 : (index) -> i32
fir.store %26 to %23#0 : !fir.ref<i32>
%27 = fir.convert %arg2 : (index) -> i32
fir.store %27 to %25#0 : !fir.ref<i32>
}
}
```