llvm-project

Author	SHA1	Message	Date
Slava Zakharin	ab1db26272	[flang][hlfir] Fixed some finalization/deallocation issues. (#67047 ) This set of commits resolves some of the issues with elemental calls producing results that may require finalization, and also some memory leak issues due to the missing deallocation of allocatable components of the temporary buffers created by the bufferization pass. - [flang][runtime] Expose Finalize API for derived types. - [flang][hlfir] Add 'finalize' attribute for DestroyOp. - [flang][hlfir] Postpone result finalization for elemental calls. The results of elemental calls generated inside hlfir.elemental must not be finalized/destructed before they are copied into the resulting array. The finalization must be done on the array as a whole (e.g. there might be different scalar and array finalization routines). The finalization work is left to the hlfir.destroy corresponding to this hlfir.elemental. - [flang][hlfir] Tighten requirements on hlfir.end_associate operand. If component deallocation might be required for the operand of hlfir.end_associate, we have to be able to get the variable shape/params to create a descriptor for calling the runtime. This commit adds verification that we can do so. - [flang][hlfir] Lower argument clean-ups using valid hlfir.end_associate. The operand must be a Fortran entity, when allocatable component deallocation may be required. - [flang][hlfir] Properly clean-up temporary buffers in bufferization pass. This commit combines changes for proper finalization and component deallocation of the temporary buffers. The finalization part relates to hlfir.destroy operations with 'finalize' attribute. The component deallocation might be invoked for both hlfir.destroy and hlfir.end_associate, if the operand is of a derived type with allocatable component(s). The changes are mostly in one function, so I decided not to split them. - [flang][hlfir] Disable optimizations for hlfir.elemental requiring finalization. If hlfir.elemental is coupled with hlfir.destroy with 'finalize' attribute, the temporary array result of hlfir.elemental needs to be created for the purpose of finalization. We cannot do certain optimizations on such hlfir.elemental operations. I was not able to come up with a test for the OptimizedBufferization pass, but I put the check there as well.	2023-09-22 10:47:53 -07:00
Slava Zakharin	39b6c82c5d	[flang][hlfir] Better recognize non-overlapping array sections. (#65707 ) This is a copy of the corresponding ArrayValueCopy analysis for non-overlapping array slices. It is required to achieve the same performance for Polyhedron/nf, though, additional changes are needed in the alias analysis for disambiguating host associated accesses.	2023-09-08 09:01:37 -07:00
Slava Zakharin	09361b1974	[flang][hlfir] Allow expanding realloc assignments with scalar RHS. F18 10.2.1.3 p. 3 states: If the variable is an unallocated allocatable array, expr shall have the same rank. So if LHS is an array and RHS is a scalar, then LHS must be allocated and the assignment is performed according to F18 10.2.1.3 p. 5: If expr is a scalar and the variable is an array, the expr is treated as if it were an array of the same shape as the variable with every element of the array equal to the scalar value of expr. This resolves performance regression in CPU2006/437.leslie3d caused by extra Assign runtime calls for ALLOCATABLE local arrays. Note that the extra calls do not add overhead themselves. The problem is that the descriptor for ALLOCATABLE is passed to Assign runtime function, and this messes up the points-to analysis. Example: ``` ALLOCATABLE DUDX(:),DUDY(:),DUDZ(:) ... ALLOCATE( QS(IMAX-1),FSK(IMAX-1,0:KMAX,ND), > QDIFFZ(IMAX-1), RMU(IMAX-1), EKCOEF(IMAX-1), > DUDX(IMAX-1),DUDY(IMAX-1),DUDZ(IMAX-1), ... DUDZ=0D0 ... DO I = I1, I2 DUDZ(I) = > DZI * ABD * ((U(I,J,KBD) - U(I,J,KCD)) + > 8.0D0 * (U(I,J, KK) - U(I,J,KBD))) * R6I ``` When we are not lowering `DUDZ=0D0` to Assign call, the `base_addr` of `DUDZ`'s descriptor is a result of `malloc`, and LLVM is able to figure out that the accesses through this `base_addr` cannot overlap with accesses of, for exmaple, module (global) variable DZI. This enables CSE and LICM for the loop, eventually, resulting in clean vectorization. When `DUDZ`'s descriptor "escapes" to Assign runtime function, there are no guarantees about where `base_addr` can point to. I do not think this can be resolved by using any existing LLVM function/argument attributes. Maybe we will be able to communicate the no-aliasing information to LLVM using `Full Restrict Support` representation. For the purpose of enabling HLFIR by default, I am just aligning the IR with what we have with FIR lowering. Reviewed By: tblah Differential Revision: https://reviews.llvm.org/D159391	2023-09-04 14:55:09 -07:00
Slava Zakharin	8f1671c065	[flang][hlfir] Allow hlfir.assign expansion for array slices. This case is important for `Polyhedron/channel2`: ``` u(2:M-1,1:N,new) = u(2:M-1,1:N,old) & +2.d0dtf(2:M-1,1:N)v(2:M-1,1:N,mid) & -2.d0dt/(2.d0dx)g*dhdx(2:M-1,1:N) ``` The slices of `u` on the left and the right hand sides are completely disjoint, but `old` and `new` are unknown runtime values. So the slices may also be identical rather than disjoint. For the purpose of hlfir.assign expansion we do not care whether they are identical or disjoint. Such kind of an answer does not fit well into the alias analysis definition, so I added a very simplified check to handle this case. This drops icelake execution time from 120 to 70 seconds. Reviewed By: tblah Differential Revision: https://reviews.llvm.org/D159323	2023-09-01 12:09:23 -07:00
Slava Zakharin	cdd5b1629a	[flang][hlfir] Expand array hlfir.assign's. Expand hlfir.assign with in-memory array RHS and LHS into a loop nest with element-by-element assignments. For small arrays this may result in further loop nest unrolling enabling more value propagation and redundancy elimination. Note the change in flang/test/HLFIR/opt-bufferization.fir: the hlfir.assign inside hlfir.elemental gets expanded by the new pattern. Depends on D159151 Reviewed By: tblah Differential Revision: https://reviews.llvm.org/D159246	2023-08-31 08:46:26 -07:00
Slava Zakharin	e60dc8ed7e	[flang][hlfir] Expand hlfir.assign's with scalar RHS. Expanding hlfir.assign's with scalar RHS late in MLIR optimization pipeline allows LLVM to recognize most of them as simple memset loops. This is especially important for small size LHS arrays, because the assign loop nest may be completely unrolled enabling more value propagation. Reviewed By: tblah Differential Revision: https://reviews.llvm.org/D159151	2023-08-31 08:46:26 -07:00
Tom Eccles	66abe64466	[flang][hlfir] add an optimized bufferization pass This pass is intended to spot cases where we can do better than the default bufferization and to rewrite those specific cases. Then the default bufferization (bufferize-hlfir pass) can handle everything else. The transformation added in this patch rewrites simple element-wise updates to an array to a do-loop modifying the array in place instead of creating and assigning an array temporary. See the RFC at https://discourse.llvm.org/t/rfc-hlfir-optimized-bufferization-for-elemental-array-updates This patch gets the improvement to exchange2 but not the improvement to cam4 described in the RFC. I think the cam4 improvement will require better alias analysis. I aim to follow up to fix this in a later patch. With changes since the RFC, the pass improves polyhedron channel2 by about 52%. Depends on: D156805 D157718 D157626 Differential Revision: https://reviews.llvm.org/D157107	2023-08-18 09:51:22 +00:00

7 Commits