Introduce a new attribute `acc.specialized_routine` to mark functions
that have been specialized from a host function marked with
`acc.routine_info`.
The new attribute captures:
- A SymbolRefAttr referencing the original `acc.routine` operation
- The parallelism level via the new `ParLevel` enum
- The original function name (since specialized functions may be
renamed)
Example - before specialization:
```
acc.routine @routine_gang func(@foo) gang
acc.routine @routine_vector func(@foo) vector
func.func @foo() attributes {
acc.routine_info = #acc.routine_info<[@routine_gang, @routine_vector]>
} { ... }
```
After specialization, there are three functions:
the original function and two specialized versions (one per parallelism
level):
```
acc.routine @routine_gang func(@foo) gang
acc.routine @routine_vector func(@foo) vector
// Original function (unchanged)
func.func @foo() attributes {
acc.routine_info = #acc.routine_info<[@routine_gang, @routine_vector]>
} { ... }
// Specialized for gang parallelism
func.func @foo_gang() attributes {
acc.specialized_routine = #acc.specialized_routine<@routine_gang,
<gang_dim1>, "foo">
} { ... }
// Specialized for vector parallelism
func.func @foo_vector() attributes {
acc.specialized_routine = #acc.specialized_routine<@routine_vector,
<vector>, "foo">
} { ... }
```
This commit introduces the ACCImplicitDeclare pass to the OpenACC
dialect, complementing ACCImplicitData by handling global variables
referenced in OpenACC compute regions and routines.
Overview:
---------
The pass applies implicit `acc declare` actions to global variables
referenced in OpenACC regions. While the OpenACC spec focuses on
implicit data mapping (handled by ACCImplicitData), implicit declare is
advantageous and required for specific cases:
1. Globals referenced in implicit `acc routine` - Since data mapping
only applies to compute regions, globals in routines must use `acc
declare`.
2. Compiler-generated globals - Type descriptors, runtime names, and
error reporting strings introduced during compilation that wouldn't be
visible for user-provided `acc declare` directives.
3. Constant globals - Constants like filename strings or initialization
values benefit from being marked with `acc declare` rather than being
mapped repeatedly (e.g., 1000 kernel launches shouldn't map the same
constant 1000 times).
Implementation:
---------------
The pass performs this in two phases:
1. Hoisting: Non-constant globals in compute regions have their
address-of operations hoisted out of the region when possible, allowing
implicit data mapping instead of declare marking.
2. Declaration: Remaining that must be device available (constants,
globals in routines, globals in recipe operations) are marked with the
acc.declare attribute.
The pass processes:
- OpenACC compute constructs (parallel, kernels, serial)
- Functions marked with acc routine
- Private, firstprivate, and reduction recipes (when used)
- Initialization regions of existing declared globals
Requirements:
-------------
The pass requires operations to implement:
- acc::AddressOfGlobalOpInterface (for address-of ops)
- acc::GlobalVariableOpInterface (for global definitions)
- acc::IndirectGlobalAccessOpInterface (for indirect access)