719 Commits

Author SHA1 Message Date
Valentin Clement (バレンタイン クレメン)
1d05d693a1
[flang][cuda] Fix offset with multiple assumed size shared array (#154844)
When multiple assumed size variable are used in a kernel with dynamic
shared memory, each variable use the 0 offset. Update the pass to
account for that.

```
attributes(global) subroutine testany( a )
    real(4), shared :: smasks(*)
    real(8), shared :: dmasks(*)
end subroutine
```
2025-08-21 21:51:43 +00:00
Chaitanya
4a3bf27c69
[OpenMP] Introduce omp.target_allocmem and omp.target_freemem omp dialect ops. (#145464)
This PR introduces two new ops in omp dialect, omp.target_allocmem and
omp.target_freemem.
omp.target_allocmem: Allocates heap memory on device. Will be lowered to
omp_target_alloc call in llvm.
omp.target_freemem: Deallocates heap memory on device. Will be lowered
to omp+target_free call in llvm.


Example:
  %1 = omp.target_allocmem %device : i32, i64
  omp.target_freemem %device, %1 : i32, i64

The work in this PR is C-P/inspired from @ivanradanov commit from
coexecute implementation:
[Add fir omp target alloc and free
ops](be860ac8ba)
[Lower omp_target_{alloc,free} to
llvm](6e2d584dc9)
2025-08-18 18:15:11 +05:30
Terapines MLIR
c164e6309b
[flang][fir] Add conversion of fir.iterate_while to scf.while. (#152439)
This commmit is a supplement for
https://github.com/llvm/llvm-project/pull/140374.

RFC:https://discourse.llvm.org/t/rfc-add-fir-affine-optimization-fir-pass-pipeline/86190/6
2025-08-14 13:39:55 +08:00
Slava Zakharin
b8e4232bd2
[flang] Cast fir.select[_rank] selector to i64. (#153239)
Properly cast the selector to `i64` regardless of its integer type.
We used to generate llvm.trunc always.

We have to use `i64` as long as the case values may exceed INT_MAX.

Fixes #153050.
2025-08-12 16:43:44 -07:00
Terapines MLIR
8e9ca057eb
[flang][fir] Add conversion of fir.if to scf.if. (#149959)
This commmit is a supplement for
https://github.com/llvm/llvm-project/pull/140374.

RFC:https://discourse.llvm.org/t/rfc-add-fir-affine-optimization-fir-pass-pipeline/86190/6
2025-07-25 10:03:50 +08:00
Kelvin Li
df56b1a2cf
[flang] handle allocation of zero-sized objects (#149165)
This PR handles the allocation of zero-sized objects for different
implementations. One byte is allocated for the zero-sized objects.
2025-07-17 23:52:48 -04:00
Valentin Clement (バレンタイン クレメン)
1e4e2b332d
[flang][cuda] Import type descriptor in the gpu module when needed (#149157) 2025-07-16 14:12:27 -07:00
Valentin Clement (バレンタイン クレメン)
2c6771889a
[flang][cuda] Introduce cuf.set_allocator_idx operation (#148717) 2025-07-14 17:23:18 -07:00
Slava Zakharin
4775b96898
[flang] Optimize redundant array repacking. (#147881)
This patch allows optimizing redundant array repacking, when
the source array is statically known to be contiguous.
This is part of the implementation plan for the array repacking
feature, though, it does not affect any real life use case
as long as FIR inlining is not a thing. I experimented with
simple cases of FIR inling using `-inline-all`, and I recorded
these cases in optimize-array-repacking.fir tests.
2025-07-14 09:41:42 -07:00
Slava Zakharin
fc99ef7411
[flang] Allow embox's source_box to be a !fir.box. (#148305)
In order to create temporary copies of assumed-type arrays
(e.g. for `-frepack-arrays`), we have to allow the source_box
to be a !fir.box.

This patch replaces #147618.
2025-07-14 09:40:42 -07:00
Christian Ulmann
374d5da214
[MLIR][Interfaces] Remove negative branch weight verifier (#148234)
This commit removes the verifier that checked if branch weights are
negative. This check was too strict because weights are interpreted as
unsigned integers.

This showed up when running the verifier on LLVM dialect modules that
were imported from LLVM IR.
2025-07-14 07:34:29 +02:00
Kareem Ergawy
eba35cc1c0
[flang][do concurrent] Re-model reduce to match reductions are modelled in OpenMP and OpenACC (#145837)
This PR proposes re-modelling `reduce` specifiers to match OpenMP and
OpenACC. In particular, this PR includes the following:

* A new `fir` op: `fir.delcare_reduction` which is identical to OpenMP's
`omp.declare_reduction` op.
* Updating the `reduce` clause on `fir.do_concurrent.loop` to use the
new op.
* Re-uses the `ReductionProcessor` component to emit reductions for `do
concurrent` just like we do for OpenMP. To do this, the
`ReductionProcessor` had to be refactored to be more generalized.
* Upates mapping `do concurrent` to `fir.loop ... unordered` nests using
the new reduction model.

Unfortunately, this is a big PR that would be difficult to divide up in
smaller parts because the bottom of the changes are the `fir` table-gen
changes to `do concurrent`. However, doing these MLIR changes cascades
to the other parts that have to be modified to not break things.

This PR goes in the same direction we went for `private/local`
speicifiers. Now the `do concurrent` and OpenMP (and OpenACC) dialects
are modelled in essentially the same way which makes mapping between
them more trivial, hopefully.

PR stack:
- https://github.com/llvm/llvm-project/pull/145837 (this one)
- https://github.com/llvm/llvm-project/pull/146025
- https://github.com/llvm/llvm-project/pull/146028
- https://github.com/llvm/llvm-project/pull/146033
2025-07-11 06:39:30 +02:00
Razvan Lupusoru
4859b92b7f
[flang][acc] Update FIR ref, heap, and pointer to be MappableType (#147834)
The MappableType OpenACC type interface is a richer interface that
allows OpenACC dialect to be capable to better interact with a source
dialect, FIR in this case. fir.box and fir.class types already
implemented this interface. Now the same is being done with the other
FIR types that represent variables.

One additional notable change is that fir.array no longer implements
this interface. This is because MappableType is primarily intended for
variables - and FIR variables of this type have storage associated and
thus there's a pointer-like type (fir.ref/heap/pointer) that holds the
array type.

The end goal of promoting these FIR types to MappableType is that we
will soon implement ability to generate recipes outside of the frontend
via this interface.
2025-07-10 15:23:57 -07:00
Tom Eccles
ed17bf1e4c
[flang] Fix tests broken by #146734 (#147055)
These tests referred to privatizers which were never declared
2025-07-04 14:50:29 +01:00
Razvan Lupusoru
f16983f7d0
[flang][acc] Ensure fir.class is handled in type categorization (#146174)
fir.class is treated similarly as fir.box - but it has one key
distinction which is that it doesn't hold an element type. Thus the
categorization logic was mishandling this case for this reason (and also
the fact that it assumed that a base object is always a fir.ref).

This PR improves this handling and adds appropriate test exercising both
a class and a class field to ensure categorization works.
2025-06-30 15:04:14 -07:00
Valentin Clement (バレンタイン クレメン)
f4cecfe1bb
[flang][cuda] Bring PARAMETER arrays into the GPU module (#146416) 2025-06-30 14:24:44 -07:00
jeanPerier
22ee837ec0
[flang][NFC] do not copy fields in fir::RecordType::getTypeList (#145530)
For historical reason, `fir::RecordType::getTypeList` was returning an
std::vector, causing the entire field list to be copied when called.

It is called a lot indirectly in all type helpers, which themselves are
called a lot in derived type heavy code like WRF.
The `fir::hasDynamicType` helper is also called a lot, and it can just
check for length parameters to avoid looping on all derived type
components in most cases.
2025-06-25 11:51:07 +02:00
Lei Huang
d715ecba79
Revert "[flang][fir] Add fir.if -> scf.if and add filecheck test … (#142965)" (#145345)
This reverts commit 823750d873dff1d03865900042fc9b58e0f7f9c3.

Test causes segfault on aix flang builder.
2025-06-23 16:46:47 -04:00
Slava Zakharin
70343c8d44
[mlir][flang] Added Weighted[Region]BranchOpInterface's. (#142079)
The new interfaces provide getters and setters for the weight
information about the branches of BranchOpInterface and
RegionBranchOpInterface operations.

These interfaces are done the same way as LLVM dialect's
BranchWeightOpInterface.

The plan is to produce this information in Flang, e.g. mark
most probably "cold" code as such and allow LLVM to order
basic blocks accordingly. An example of such a code is
copy loops generated for arrays repacking - we can mark it
as "cold" assuming that the copy will not happen dynamically.
If the copy actually happens the overhead of the copy is probably high
enough so that we may not care about the little overhead
of jumping to the "cold" code and fetching it.
2025-06-17 16:14:13 -07:00
Kareem Ergawy
282e471018
[flang] Erase fir.local ops before lowering fir to llvm (#143687)
`fir.local` ops are not supposed to have any uses at this point (i.e.
during lowering to LLVM). In case of serialization, the
`fir.do_concurrent` users are expected to have been lowered to
`fir.do_loop` nests. In case of parallelization, the `fir.do_concurrent`
users are expected to have been lowered to the target parallel model
(e.g. OpenMP).

This hopefully resolved a build issue introduced by
https://github.com/llvm/llvm-project/pull/142567 (see for example:
https://lab.llvm.org/buildbot/#/builders/199/builds/4009).
2025-06-12 05:58:55 +02:00
Jameson Nash
082251bba4
[AArch64] fix trampoline implementation: use X15 (#126743)
AAPCS64 reserves any of X9-X15 for a compiler to choose to use for this
purpose, and says not to use X16 or X18 like GCC (and the previous
implementation) chose to use. The X18 register may need to get used by
the kernel in some circumstances, as specified by the platform ABI, so
it is generally an unwise choice. Simply choosing a different register
fixes the problem of this being broken on any platform that actually
follows the platform ABI (which is all of them except EABI, if I am
reading this linux kernel bug correctly
https://lkml2.uits.iu.edu/hypermail/linux/kernel/2001.2/01502.html). As
a side benefit, also generate slightly better code and avoids needing
the compiler-rt to be present. I did that by following the XCore
implementation instead of PPC (although in hindsight, following the
RISCV might have been slightly more readable). That X18 is wrong to use
for this purpose has been known for many years (e.g.
https://www.mail-archive.com/gcc@gcc.gnu.org/msg76934.html) and also
known that fixing this to use one of the correct registers is not an ABI
break, since this only appears inside of a translation unit. Some of the
other temporary registers (e.g. X9) are already reserved inside llvm for
internal use as a generic temporary register in the prologue before
saving registers, while X15 was already used in rare cases as a scratch
register in the prologue as well, so I felt that seemed the most logical
choice to choose here.
2025-06-11 21:49:01 -04:00
Pranav Bhandarkar
f993f362ef
[Flang][OpenMP] - When mapping a fir.boxchar, map the underlying data pointer as a member (#141715)
This PR adds functionality to the `MapInfoFinalization` pass wherein the
underlying data pointer of a `fir.boxchar` is mapped as a member of the
parent boxchar.
2025-06-10 13:09:32 -05:00
Dominik Adamski
007d29e30c
[Flang] Turn on alias analysis for locally allocated objects (#143489)
Previously, a bug in the MemCptOpt LLVM IR pass caused issues with
adding alias tags for locally allocated objects for Fortran code.

However, the bug has now been fixed (https://github.com/llvm/llvm-project/pull/129537 ),
and we can safely enable alias tags for these objects. This change should
improve the accuracy of the alias analysis.

More accurate alias analysis assumes that Cray pointers do not alias
with other variables. This assumption is common among other compilers.
If the code violates this assumption, it can lead to incorrect results
(see: https://github.com/llvm/llvm-project/issues/141928)
2025-06-10 16:46:13 +02:00
Q
823750d873
[flang][fir] Add fir.if -> scf.if and add filecheck test file (#142965)
This commmit is a supplement for
https://github.com/llvm/llvm-project/pull/140374.

RFC:https://discourse.llvm.org/t/rfc-add-fir-affine-optimization-fir-pass-pipeline/86190/6

---------

Co-authored-by: ZhiQiang Fan <zhiqiang.fan@terapines.com>
2025-06-10 15:43:24 +08:00
Pranav Bhandarkar
8395912895
[Flang] - Handle BoxCharType in fir.box_offset op (#141713)
To map `fir.boxchar` types reliably onto an offload target, such as a
GPU, the `omp.map.info` operation is used to map the underlying data
pointer (`fir.ref<fir.char<k, ?>>`) wrapped by the `fir.boxchar` MLIR
value. The `omp.map.info` operation needs a pointer to the underlying
data pointer.
Given a reference to a descriptor (`fir.box`), the `fir.box_offset` is
used to obtain the address of the underlying data pointer. This PR
extends `fir.box_offset` to provide the same functionality for
`fir.boxchar` as well.
2025-06-06 10:48:07 -05:00
Tom Eccles
d16ecad968
[flang] Disable noalias by default (#142128)
With these enabled we see a 70% performance regression for exchange2_r
on neoverse-v1 (aws graviton 3) using `-mcpu=native -Ofast -flto`. There
is also a smaller regression on neoverse-v2.

This appears to be because function specialization is no longer kicking
in during LTO for digits_2. This can be seen in the output executable:
previously it contained specialized copies of the function with names
like `_QMbrute_forcePdigits_2.specialized.4`. Now there are no names
like this.

The bug is not in flang - instead in the function specialization pass -
but due to the size of the regression I would like to request that this
is disabled until function specialization has been fixed.
2025-05-30 17:35:41 +01:00
Slava Zakharin
a0d699a8e6 Reland "[flang] Added noalias attribute to function arguments. (#140803)"
This helps to disambiguate accesses in the caller and the callee
after LLVM inlining in some apps. I did not see any performance
changes, but this is one step towards enabling other optimizations
in the apps that I am looking at.

The definition of llvm.noalias says:
```
... indicates that memory locations accessed via pointer values based on the argument or return value are not also accessed, during the execution of the function, via pointer values not based on the argument or return value. This guarantee only holds for memory locations that are modified, by any means, during the execution of the function.
```

I believe this exactly matches Fortran rules for the dummy arguments
that are modified during their subprogram execution.

I also set llvm.noalias and llvm.nocapture on the !fir.box<> arguments,
because the corresponding descriptors cannot be captured and cannot
alias anything (not based on them) during the execution of the
subprogram.
2025-05-29 13:42:57 -07:00
Slava Zakharin
6ee2453360
Revert "[flang] Added noalias attribute to function arguments." (#141884)
Reverts llvm/llvm-project#140803

Buildbot failure:
https://lab.llvm.org/buildbot/#/builders/143/builds/8041
2025-05-28 18:06:11 -07:00
Slava Zakharin
2426ac6865
[flang] Added noalias attribute to function arguments. (#140803)
This helps to disambiguate accesses in the caller and the callee
after LLVM inlining in some apps. I did not see any performance
changes, but this is one step towards enabling other optimizations
in the apps that I am looking at.

The definition of llvm.noalias says:
```
... indicates that memory locations accessed via pointer values based on the argument or return value are not also accessed, during the execution of the function, via pointer values not based on the argument or return value. This guarantee only holds for memory locations that are modified, by any means, during the execution of the function.
```

I believe this exactly matches Fortran rules for the dummy arguments
that are modified during their subprogram execution.

I also set llvm.noalias and llvm.nocapture on the !fir.box<> arguments,
because the corresponding descriptors cannot be captured and cannot
alias anything (not based on them) during the execution of the
subprogram.
2025-05-28 17:18:04 -07:00
MingYan
953302eb98
[flang][fir] Add FIR structured control flow ops to SCF dialect pass. (#140374)
This patch only supports the conversion from `fir.do_loop` to `scf.for`.
This pass is still experimental, and future work will focus on gradually
improving this conversion pass.

Co-authored-by: yanming <ming.yan@terapines.com>
2025-05-25 14:28:47 +08:00
Valentin Clement (バレンタイン クレメン)
6811a3bedf
[flang][cuda] Allocate extra descriptor in managed memory when it is coming from device (#140818) 2025-05-20 18:55:13 -07:00
jeanPerier
ed07412888
[flang] translate derived type array init to attribute if possible (#140268)
This patch relies on #140235 and #139724 to speed-up compilations of
files with derived type array global with initial value.
Currently, such derived type global init was lowered to an
llvm.mlir.insertvalue chain in the LLVM IR dialect because there was no
way to represent such value via attributes.

This chain was later folded in LLVM dialect to LLVM IR using LLVM IR
(not dialect) folding. This insert chain generation and folding is very
expensive for big arrays. For instance, this patch brings down the
compilation of FM_lib fmsave.f95 from 50s
to 0.5s.
2025-05-20 16:11:27 +02:00
Valentin Clement (バレンタイン クレメン)
f5609aa1b0
[flang][cuda] Use a reference for asyncObject (#140614)
Switch from `int64_t` to `int64_t*` to fit with the rest of the
implementation.

New tentative with some fix. The previous was reverted some time ago.

Reviewed in #138010
2025-05-19 15:02:53 -07:00
jeanPerier
416b7dfaa0
[flang] use DataLayout instead of GEP to compute element size (#140235)
Now that the datalayout is part of codegen, use that to generate type
size constants in codegen instead of generating GEP.
2025-05-19 13:59:09 +02:00
Dominik Adamski
eb4fde9a4e
Revert "[Flang] Turn on alias analysis for locally allocated objects" (#140202)
Reverts llvm/llvm-project#139682 (commit: cf16c97bfa1416672d8990862369e86f360aa11e )
due to reported regression in Fujitsu Fortran test suite:
https://ci.linaro.org/job/tcwg_flang_test--main-aarch64-Ofast-sve_vla-build/2081/artifact/artifacts/notify/mail-body.txt/*view*/
2025-05-16 09:44:33 +02:00
Sergio Afonso
30b0946326
[Flang][MLIR][OpenMP] Improve use_device_* handling (#137198)
This patch updates MLIR op verifiers for operations taking arguments
that must always be defined by an `omp.map.info` operation to check this
requirement.

It also modifies Flang lowering for `use_device_{addr, ptr}`, as well as
the custom MLIR printer and parser for these clauses, to support
initializing it to `OMP_MAP_RETURN_PARAM` and represent this in the MLIR
representation as `return_param`. This internal mapping flag is what
eventually is used for variables passed via these clauses into the
target region when translating to LLVM IR, so making it explicit in
Flang and MLIR removes an inconsistency in the current representation.
2025-05-15 12:28:06 +01:00
Asher Mancinelli
f486cc4417
[flang] Add loop annotation attributes to the loop backedge (#126082)
Flang currently adds loop metadata to a conditional branch in the loop
preheader, while clang adds it to the loop latch's branch instruction.
Langref says:

> Currently, loop metadata is implemented as metadata attached to the
branch instruction in the loop latch block.
>
> https://llvm.org/docs/LangRef.html#llvm-loop

I misread langref a couple times, but I think this is the appropriate
branch op for the LoopAnnotationAttr. In a couple examples I found that
the metadata was lost entirely during canonicalization. This patch makes
the codegen look more like clang's and the annotations persist through
codegen.

* current clang: https://godbolt.org/z/8WhbcrnG3
* current flang: https://godbolt.org/z/TrPboqqcn
2025-05-14 07:07:57 -07:00
Dominik Adamski
cf16c97bfa
[Flang] Turn on alias analysis for locally allocated objects (#139682)
Previously, a bug in the MemCptOpt LLVM IR pass caused issues with
adding alias tags for locally allocated objects for Fortran code.

However, the bug has now been fixed (
https://github.com/llvm/llvm-project/pull/129537 ), and we can safely
enable alias tags for these objects. This change should improve the
accuracy of the alias analysis.
2025-05-14 09:21:18 +02:00
Asher Mancinelli
bbb7f01481
[flang] Fix volatile attribute propagation on allocatables (#139183)
Ensure volatility is reflected not just on the reference to an
allocatable, but on the box, too. When we declare a volatile
allocatable, we now get a volatile reference to a volatile box.

Some related cleanups:
* SELECT TYPE constructs check the selector's type for volatility when
creating and designating the type used in the selecting block.
* Refine the verifier for fir.convert. In general, I think it is ok to
implicitly drop volatility in any ptr-to-int conversion because it means
we are in codegen (and representing volatility on the LLVM ops and
intrinsics) or we are calling an external function (are there any cases
I'm not thinking of?)
* An allocatable test that was XFAILed is now passing. Making
allocatables' boxes volatile resulted in accesses of those boxes being
volatile, which resolved some errors coming from the strict verifier.
* I noticed a runtime function was missing the fir.runtime attribute.
2025-05-13 08:13:47 -07:00
Slava Zakharin
2d12d31f44
[flang] Propagate contiguous attribute through HLFIR. (#138797)
This change allows marking more designators producing an opaque
box with 'contiguous' attribute, e.g. like in test1 case
in flang/test/HLFIR/propagate-contiguous-attribute.fir.
This would make isSimplyContiguous() return true for such
designators allowing merging hlfir.eval_in_mem with hlfir.assign
where the LHS is a contiguous array section.

Depends on #139003
2025-05-12 18:33:47 -07:00
MingYan
db2d5762eb
[flang][fir] Support promoting fir.do_loop with results to affine.for. (#137790)
Co-authored-by: yanming <ming.yan@terapines.com>
2025-05-09 10:55:21 +08:00
Kareem Ergawy
227e1ff73b
[flang][fir] Add locality specifiers modeling to fir.do_concurrent.loop (#138506) 2025-05-08 21:42:52 +02:00
Kareem Ergawy
a83bb35e99
[flang][fir] Add fir.local op for locality specifiers (#138505)
Adds a new `fir.local` op to model `local` and `local_init` locality
specifiers. This op is a clone of `omp.private`. In particular, this new
op also models the privatization/localization logic of an SSA value in
the `fir` dialect just like `omp.private` does for OpenMP.

PR stack:
- https://github.com/llvm/llvm-project/pull/137928
- https://github.com/llvm/llvm-project/pull/138505 (this PR)
- https://github.com/llvm/llvm-project/pull/138506
- https://github.com/llvm/llvm-project/pull/138512
- https://github.com/llvm/llvm-project/pull/138534
- https://github.com/llvm/llvm-project/pull/138816
2025-05-07 14:00:06 +02:00
Asher Mancinelli
7220fdad0c
[flang] Hide strict volatility checks behind flag (#138183)
Enabling volatility lowering by default revealed some issues in lowering
and op verification.

For example, given volatile variable of a nested type, accessing
structure members of a structure member would result in a volatility
mismatch when the inner structure member is designated (and thus a
verification error at compile time).

In other cases, I found correct codegen when the checks were disabled,
also related to allocatable types and how we handle volatile references
of boxes.

This hides the strict verification of fir and hlfir ops behind a flag so
I can iteratively improve lowering of volatile variables without causing
compile-time failures, keeping the strict verification on when running
tests.
2025-05-02 09:03:20 -07:00
Valentin Clement (バレンタイン クレメン)
9b6b144438
Revert "[flang][cuda] Use a reference for asyncObject" (#138221)
Reverts llvm/llvm-project#138186
2025-05-01 17:41:44 -07:00
Valentin Clement (バレンタイン クレメン)
7f922f1400
[flang][cuda] Use a reference for asyncObject (#138186)
Switch from `int64_t` to `int64_t*` to fit with the rest of the
implementation.

New tentative with some fix. The previous was reverted yesterday.
2025-05-01 17:04:12 -07:00
Valentin Clement (バレンタイン クレメン)
01a18809ee
Revert "[flang][cuda] Use a reference for asyncObject (#138010)" (#138082)
This reverts commit 9b0eaf71e674a28ee55be3afa11b5f7d4da732c0.
2025-04-30 22:03:26 -07:00
Valentin Clement (バレンタイン クレメン)
9b0eaf71e6
[flang][cuda] Use a reference for asyncObject (#138010)
Switch from `int64_t` to `int64_t*` to fit with the rest of the
implementation.
2025-04-30 14:02:29 -07:00
Asher Mancinelli
8836bce842
[flang] Add lowering of volatile references (#132486)
[RFC on
discourse](https://discourse.llvm.org/t/rfc-volatile-representation-in-flang/85404/1)

Flang currently lacks support for volatile variables. For some cases,
the compiler produces TODO error messages and others are ignored. Some
of our tests are like the example from _C.4 Clause 8 notes: The VOLATILE
attribute (8.5.20)_ and require volatile variables.

Prior commits:
```
c9ec1bc753b0 [flang] Handle volatility in lowering and codegen (#135311)
e42f8609858f [flang][nfc] Support volatility in Fir ops (#134858)
b2711e1526f9 [flang][nfc] Support volatile on ref, box, and class types (#134386)
```
2025-04-30 08:46:33 -07:00
Kaviya Rajendiran
857ac4c229
[MLIR][OpenMP] Lowering nontemporal clause to LLVM IR for SIMD directive (#118751)
This patch,
- Added a new attribute `nontemporal` to fir.load and fir.store operation in the FIR dialect.
- Added a pass `lower-nontemporal` which is called before FIRToLLVM conversion pass and adds the nontemporal attribute to loads and stores on the list items specified in the nontemporal clause of the SIMD directive.
- Set the `UnitAttr:$nontemporal` to llvm.load and llvm.store operations during FIR to LLVM dialect conversion, if the corresponding fir.load or fir.store operations have the nontemporal attribute.
- Attached the `nontemporal metadata` to load and store instructions that have the nontemporal attribute, during LLVM dialect to LLVM IR translation.
2025-04-30 11:13:20 +05:30