These are long overdue for removal. These were originally a hack
to support loading half values before there was any / decent support
for the half type through the backend. There's no reason to continue
supporting these, they're equivalent to fpext/fptrunc with a bitcast.
SelectionDAG stopped translating these directly, and used the
bitcast + fp cast since f7a02c17628e825, so there's been no reason
to use these since 2014.
Changes this to getSigned() to match the signedness of the calculation.
However, we still need to allow truncation because the addition
result may overflow, and the operation is specified to truncate
in that case.
Fixes https://github.com/llvm/llvm-project/issues/175159.
Addresses https://github.com/llvm/llvm-project/issues/173267.
- folds log(-x) -> NaN
- folds log(0) -> -inf
- also folds log(1) -> 0.0 without host libm
> note: log(inf) is also doable but it causes some other tests to fail
so I avoided it for now
This folds `icmp (ptrtoaddr x, ptrtoaddr y)` to `icmp (x, y)`, matching
the existing ptrtoint fold. Restrict both folds to only the case where
the result type matches the address type.
I think that all folds this can do in practice end up actually being
valid for ptrtoint to a type large than the address size as well, but I
don't really see a way to justify this generically without making
assumptions about what kind of folding the recursive calls may do.
This is based on the icmp semantics specified in
https://github.com/llvm/llvm-project/pull/163936.
The `masked.load`, `masked.store`, `masked.gather` and `masked.scatter`
intrinsics currently accept a separate alignment immarg. Replace this
with an `align` attribute on the pointer / vector of pointers argument.
This is the standard representation for alignment information on
intrinsics, and is already used by all other memory intrinsics. This
means the signatures now match llvm.expandload, llvm.vp.load, etc.
(Things like llvm.memcpy used to have a separate alignment argument as
well, but were already migrated a long time ago.)
It's worth noting that the masked.gather and masked.scatter intrinsics
previously accepted a zero alignment to indicate the ABI type alignment
of the element type. This special case is gone now: If the align
attribute is omitted, the implied alignment is 1, as usual. If ABI
alignment is desired, it needs to be explicitly emitted (which the
IRBuilder API already requires anyway).
Simplifcation of vector.reduce intrinsics are prevented by an early
bailout for ConstantInt base operands. This PR removes the bailout and
updates the tests to show matching output when
-use-constant-int-for-*-splat is used.
We can support the same folds for as for ptrtoint. For the cast pair
fold we just need to use the address type instead. The gep based folds
were already operating on the address (aka index) width anyway.
isEliminableCastPair() currently tries to support elimination of
ptrtoint/inttoptr cast pairs by assuming that the maximum possible
pointer size is 64 bits. Of course, this is no longer the case nowadays.
This PR changes isEliminableCastPair() to accept an optional DataLayout
argument, which is required to eliminate pointer casts.
This means that we no longer eliminate these cast pairs during ConstExpr
construction, and instead only do it during DL-aware constant folding.
This had a lot of annoying fallout on tests, most of which I've
addressed in advance of this change.
Fix the NaN-handling semantics of various NVVM intrinsics converting
from fp types to integer types.
Previously in ConstantFolding, NaN inputs would be constant-folded to 0.
However, v9.0 of the PTX spec states that:
In float-to-integer conversions, depending upon conversion types, NaN
input results in following value:
* Zero if source is not `.f64` and destination is not `.s64`, .`u64`.
* Otherwise `1 << (BitWidth(dst) - 1)` corresponding to the value of
`(MAXINT >> 1) + 1` for unsigned type or `MININT` for signed type.
Also, support for constant-folding +/-Inf and values which
overflow/underflow the integer output type has been added (they clamp to
min/max int).
Because of this NaN-handling semantic difference, we also need to
disable transforming several intrinsics to FPToSI/FPToUI, as the LLVM
intstruction will return poison, but the intrinsics have defined
behaviour for these edge-cases like NaN/Inf/overflow.
Scalable get_active_lane_mask intrinsics with a range of 0 can be
lowered to zeroinitializer. This helps remove no-op scalable masked
stores and loads.
This patch addresses
https://github.com/llvm/llvm-project/pull/155216#discussion_r2297724663.
This patch adds a helper function to put the inverse cast on constants,
with cast flags preserved(optional).
Follow-up patches will add trunc/ext handling on VectorCombine and flags
preservation on InstCombine.
In #149619, for the test of `@dot_follow_modulo_spec_2`, constant
folding the addition of two i32 1073741824 causes an overflow from 2^32
to -2^32=-2147483648, which triggers the UB sanitizer. This PR reapplies
the previous PR, explicitly casting the addition operand to int64_t
first before performing the addition before producing a int32 number via
`Constant *C = get(cast<IntegerType>(Ty->getScalarType()), V, isSigned)`
This introduces a new `ptrtoaddr` instruction which is similar to
`ptrtoint` but has two differences:
1) Unlike `ptrtoint`, `ptrtoaddr` does not capture provenance
2) `ptrtoaddr` only extracts (and then extends/truncates) the low
index-width bits of the pointer
For most architectures, difference 2) does not matter since index (address)
width and pointer representation width are the same, but this does make a
difference for architectures that have pointers that aren't just plain
integer addresses such as AMDGPU fat pointers or CHERI capabilities.
This commit introduces textual and bitcode IR support as well as basic code
generation, but optimization passes do not handle the new instruction yet
so it may result in worse code than using ptrtoint. Follow-up changes will
update capture tracking, etc. for the new instruction.
RFC: https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54
Reviewed By: nikic
Pull Request: https://github.com/llvm/llvm-project/pull/139357
The `nvvm_round` intrinsic should round to the nearest even number in
the case of ties. It lowers to PTX `cvt.rni`, which will "round to
nearest integer, choosing even integer if source is equidistant between
two integers", so it matches the semantics of `rint` (and not `round` as
the name suggests).
This is a follow on to
https://github.com/llvm/llvm-project/pull/115407 that introduced code
which bypasses the splat handling for scalable vectors. To maintain
existing tests I have moved the early return until after the splat
handling so all vector types are treated equally.
Fix a failing test for constant-folding the nvvm_round intrinsic. The
original implementation added in #141233 used a native libm call to the
"round" function, but on PPC this produces +0.0 if the input is -0.0,
which caused a test failure.
This patch updates it to use APFloat functions instead of native libm
calls to ensure cross-platform consistency.
This consolidates the "fold poison arg to poison result" constant
folding logic for intrinsics, based on a common
intrinsicPropagatesPoison() helper, which is also used for poison
propagation reasoning in ValueTracking. This ensures that the set of
supported intrinsics is consistent.
This add ucmp, scmp, smul.fix, smul.fix.sat, canonicalize and sqrt to
the intrinsicPropagatesPoison list, as these were handled by
ConstantFolding but not ValueTracking. The ctpop test is an example of
the converse, where it was handled by ValueTracking but not
ConstantFolding.
C's Annex F specifies that atan +/-0.0 returns the input value;
however, this behavior is optional and host C libraries may behave
differently. This change applies the Annex F behavior to constant
folding by LLVM.
Ref:
https://pubs.opengroup.org/onlinepubs/9799919799/functions/atan.html
ReadDataFromGlobal() did not handle reads from the padding of types (in
the sense of type store size != type alloc size, rather than struct
padding).
Return zero in that case.
Fixes https://github.com/llvm/llvm-project/issues/144279.
The change adds folding for 4 vector intrinsics: `interleave2`,
`deinterleave2`, `vector_extract` and `vector_insert`. For the last 2
intrinsics the change does not use `ShuffleVector` fold mechanism as
it's much simpler to construct result vector explicitly.
I noticed this when a sqrt produced by VectorCombine with a poison
operand wasn't getting folded away to poison.
Most intrinsics in general could probably be folded to poison if one of
their arguments are poison too. Are there any exceptions to this we need
to be aware of?
Add an optional flag to disable constant-folding for function calls.
This applies to both intrinsics and libcalls.
This is not necessary in most cases, so is disabled by default, but in
cases that require bit-exact precision between the result from
constant-folding and run-time execution, having this flag can be useful,
and may help with debugging. Cases where mismatches can occur include
GPU execution vs host-side folding, cross-compilation scenarios, or
compilation vs execution environments with different math library
versions.
This applies only to calls, rather than all FP arithmetic. Methods such
as fast-math-flags can be used to limit reassociation, fma-fusion etc,
and basic arithmetic operations are precisely defined in IEEE 754.
However, other math operations such as sqrt, sin, pow etc. represented
by either libcalls or intrinsics are less well defined, and may vary
more between different architectures/library implementations.
As this option is not intended for most common use-cases, this patch
takes the more conservative approach of disabling constant-folding even
for operations like fmax, copysign, fabs etc. in order to keep the
implementation simple, rather than sprinkling checks for this flag
throughout.
The use-cases for this option are similar to StrictFP, but it is only
limited to FP call folding, rather than all FP operations, as it is
about precise arithmetic results, rather than FP environment behaviours.
It also can be used to when linking .bc files compiled with different
StrictFP settings with llvm-link.
This is a follow up from #141845.
TargetTransformInfo::getOperandInfo needs to be updated to check for
undef values as otherwise a splat is considered a constant, and some
RISC-V cost model tests will start adding a cost to materialize the
constant.
As noted in
https://github.com/llvm/llvm-project/pull/141821#issuecomment-2917328924,
whilst we currently constant fold intrinsics of fixed-length vectors via
their scalar counterpart, we don't do the same for scalable vectors.
This handles the scalable vector case when the operands are splats.
One weird snag in ConstantVector::getSplat was that it produced a undef
if passed in poison, so this also contains a fix by checking for
PoisonValue before UndefValue.
Add constant-folding support for the maximumnum and minimumnum
intrinsics, and extend the tests to show the qnan vs snan behavior
differences between maxnum/maximum/maximumnum.
I tried to use these with a const reference in a separate patch, but the
pointers weren't marked as const. It turns out that these don't mutate
the instruction.