…tValue
Current choice was only working out of accident on 64 bit machine, it
led to an implicit cast to smaller type on 32 bit machine. Use the exact
type instead.
Polyhedron/capacita,protein and CPU2000/facerec,wupwise showed up to
60% regression on x86 after #101421. The memcpy loops of the toAt and
fromAt arrays that are run to create the initial work item end up
being encoded as 'rep mov', and they add noticeable overhead
comparing to the total amount of work. 'rep mov' is not the best
choise for small size memcpy (e.g. when the array rank is 1 or 2,
it would be quite slow). Moreover, the rest of the stack related
setup is also noticeable for the simple cases.
I added a shortcut for the simple copy case, and also got rid
of the initial toAt/fromAt copies by allowing the CopyDescriptor
to use the external subscript storages.
Device compilers may fail to identify maximum stack size required
by a kernel that calls CopyElement due to potential recursive calls.
To avoid this, we can use dynamically allocated Stack. To avoid
dynamic allocations on the host for simple cases, the Stack
implementation
has a reserved space (that ends up being allocated on the program
stack).
I tested both pre-allocated and 0-reserve implementations on the host,
and all passed. The actual reserve values might be tuned as needed.
Spread, reshape, pack, and other transformational intrinsic runtimes are
using `CopyElement` utility to copy elements. This utility was dealing
with deep copies, but only when the allocatable components where
"immediate" components of the type being copied. If the allocatable
components were nested inside a nonpointer/nonallocatable component,
they were not deep copied, leading to bugs later when manipulating the
value (or double free when applying #81117).
Visit data components with allocatable components (using the
noDestructionNeeded flag to avoid expensive and useless type visit when
there are no such components).
Move the closure of the subset of flang/runtime/*.h header files that
are referenced by source files outside flang/runtime (apart from unit tests)
into a new directory (flang/include/flang/Runtime) so that relative
include paths into ../runtime need not be used.
flang/runtime/pgmath.h.inc is moved to flang/include/flang/Evaluate;
it's not used by the runtime.
Differential Revision: https://reviews.llvm.org/D109107
This is *not* user-defined derived type I/O, but rather Fortran's
built-in capabilities for using derived type data in I/O lists
and NAMELIST groups.
This feature depends on having the derived type description tables
that are created by Semantics available, passed through compilation
as initialized static objects to which pointers can be targeted
in the descriptors of I/O list items and NAMELIST groups.
NAMELIST processing now handles component references on input
(e.g., "&GROUP x%component = 123 /").
The C++ perspectives of the derived type information records
were transformed into proper classes when it was necessary to add
member functions to them.
The code in Semantics that generates derived type information
was changed to emit derived type components in component order,
not alphabetic order.
Differential Revision: https://reviews.llvm.org/D104485
Define APIs, naively implement, and add basic sanity unit tests for
the transformational intrinsic functions CSHIFT, EOSHIFT, PACK,
SPREAD, TRANSPOSE, and UNPACK. These are the remaining transformational
intrinsic functions that rearrange data without regard to type
(except for default boundary values in EOSHIFT); RESHAPE was already
in place as a stress test for the runtime's descriptor handling
facilities.
Code is in place to create copies of allocatable/automatic
components when transforming arrays of derived type, but it won't
do anything until we have derived type information being passed to the
runtime from the frontend.
Differential Revision: https://reviews.llvm.org/D102857