Previously, power-of-2 div/rem operations wider than
MaxLegalDivRemBitWidth were excluded from IR expansion and left for
backend peephole optimizations. Some backends can fail to process such
instructions in case we switch off DAGCombiner.
Now ExpandIRInsts expands them into shift/mask sequences:
- udiv X, 2^C -> lshr X, C
- urem X, 2^C -> and X, (2^C - 1)
- sdiv X, 2^C -> bias adjustment + ashr X, C
- srem X, 2^C -> X - (((X + Bias) >> C) << C)
Special cases handled:
- Division/remainder by 1 or -1 (identity, negation, or zero)
- Exact division (sdiv exact skips bias, produces ashr exact)
- Negative power-of-2 divisors (result is negated)
- INT_MIN divisor (correct via countr_zero on bit pattern)
Proofs: https://alive2.llvm.org/ce/z/Y-iWm-
Assisted-by: Cursor // Claude Opus 4.6
Add support for expanding fptosi.sat and fptoui.sat via IR expansions.
Similar to fptosi/fptoui we would get legalization errors otherwise.
The previous expansion for fptosi/fptoui was already saturating -- but
those instructions do not actually require saturation, and the
implementation of the saturation was incorrect in lots of ways. What
this PR does is:
* For fptosi, remove the unnecessary saturation handling.
* For fptoui, remove the unnecessary saturation handling and sign
multiplication.
* For fptosi, use the previous saturation handling with fixes: We need
to map NaNs to 0 and the saturation condition on the exponent was
incorrect. (I'm performing the NaN check via fcmp -- there's no
requirement to do everything bitwise here.)
* For fptoui use a variation of the signed saturation handling: Negative
values need to go to zero and we saturate to unsigned max.
Proofs: https://alive2.llvm.org/ce/z/Xv9FNd
Handle this case by extending the integer to a wider type. This can
probably be handled more optimally, but this is conservatively correct.
Proof: https://alive2.llvm.org/ce/z/0RwDO1
I don't think anything here requires the integer bit width to be
strictly larger. It's fine if it's the same (in which case some zexts
just go away).
Add tests on half + i32 that can be verified by alive2. Note that half
is handled via float, so the minimum supported type is i32 rather than
i16.
Proof (uitofp): https://alive2.llvm.org/ce/z/CsMfkU
Proof (sitofp): https://alive2.llvm.org/ce/z/jzuxyt
In order to support expanding fptoi where the target type is smaller,
make most of the code work on the float-as-integer type, rather than the
target type of the cast. We only need to cast the final result to the
target type, or prior to performing a left shift.
This not only allows us to handle casts to a smaller type, but also
avoids performing intermediate calculations on unnecessarily large
types.
This also matches how compiler-rt handles this:
https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/builtins/fp_fixint_impl.inc
Proof: https://alive2.llvm.org/ce/z/3pJ9pE
(Note that there is a pre-existing issue that we produce the same code
for fptosi and fptoui.)
Allow testing fptoui/fptosi on half types, which are small enough
for alive2 to verify the result.
They currently pass for non-undef/poison input. (The fptoui
expansion is the same as fptosi, which is confusing, but not
incorrect, because the saturation it performs is not actually
required by fptoi.)
- Remove pass initialization calls from pass constructors.
- For some passes, add the initialization to `initializeCodeGen` or
`initializeGlobalISel`.
- Remove redundant initializations from llc and X86 target for some
passes.
This is mostly boilerplate to move various freestanding utility
functions into LegalizerHelper. LibcallLoweringInfo is currently
optional, mostly because threading it through assorted other
uses of LegalizerHelper is more difficult.
I had a lot of trouble getting this to work in the legacy pass
manager with setRequiresCodeGenSCCOrder, and am not happy with the
result. A sub-pass manager is introduced and this is invalidated,
so we're re-computing this unnecessarily.
The pass now contains a non-fp expansion and should
be used for any similar expansions regardless of the
types involved. Hence a generic name seems apt.
Rename the source files, pass, and adjust the pass
description. Move all tests for the expansions
that have previously been merged into the pass
to a single directory.