Modify the return value of the runImpl function which indicates whether or not the IR has been changed in a
single place instead of doing it separately for each instruction at the insertion into the worklist.
Further changes: Replace if-else in worklist processing loop by switch and add test cases which demonstrate that the "scalarize" function does not always add items to the worklist and hence a worklist emptiness check cannot be used for the runImpl return value.
The last change to the pass in PR #158588 lost the assignment to the
"Modified" variable for one of the pass optimizations.
Add it back. This fixes the test failure in
`CodeGen/AMDGPU/itofp.i128.bf.ll` (in a
`LLVM_ENABLE_EXPENSIVE_CHECKS=ON` build).
Extend the existing "scalarize" function which is used for the
fp-integer conversion instruction expansion to BinaryOperator
instructions and reuse it for the frem expansion; a similar function
for scalarizing BinaryOperator instructions exists in the ExpandLargeDivRem
pass and this change is a step towards merging that pass with ExpandFp.
Further refactoring: Scalarize directly instead of using the
"ReplaceVector" as a worklist, rename "Replace" vector to "Worklist",
and hoist a check for unsupported scalable vectors to the top of the
instruction visiting loop.
Align the syntax used for the optimization level argument of the
expand-fp pass in textual descriptions of pass pipelines with the syntax
used by other passes taking a similar argument. That is, use e.g.
`expand-fp<O1>` instead of `expand-fp<opt-level=1>`.
As observed by @mikaelholmen, PR #130988
"[AMDGPU] Implement IR expansion for frem instruction" introduced a
regression on x86. Its changes led to the pass being skipped on
functions with the optnone attribute. @bjope also noted that a check
concerning the optnone handling is wrong.
This patch fixes both issues which together fixes the regression. During
the review it was observed that, even before PR #130988, the pass would
not run on optnone functions with the new pass manager. This is also
fixed.
This patch implements a correctly rounded expansion of the frem
instruction in LLVM IR. This is useful for target architectures for
which such an expansion is too involved to be implement in ISel
Lowering. The expansion is based on the code from the AMD device libs
and has been tested successfully against the OpenCL conformance tests on
amdgpu. The expansion is implemented in the preexisting "expand-fp"
pass. It replaces the expansion of "frem" in ISel for the amdgpu target;
it is enabled for targets which do not directly support "frem" and for
which no matching "fmod" LibCall is available.
---------
Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
This is meant as a preparation for PR #130988 "[AMDGPU] Implement IR
expansion for frem instruction" which implements the expansion of
another instruction in this pass. The more general name seems more
appropriate given this change and quite reasonable even without it.