3883 Commits

Author SHA1 Message Date
Simon Pilgrim
4baf29e81e [DAG] Handle cases where a shift amount is larger than the pre-extended value bitwidth
In the (zext (shl (zext x), cst)) -> (shl (zext x), cst) fold, don't use a bitmask / MaskedValueIsZero as we can't guarantee that the shift amount is in bounds.

Fixes #106202
2024-08-27 18:12:24 +01:00
Simon Pilgrim
807557654a [DAG] visitTRUNCATE_USAT_U - use sd_match to match FP_TO_UINT_SAT pattern. NFC. 2024-08-23 16:39:32 +01:00
Sumanth Gundapaneni
e78156a0e2
Scalarize the vector inputs to llvm.lround intrinsic by default. (#101054)
Verifier is updated in a different patch to let the vector types for
llvm.lround and llvm.llround intrinsics.
2024-08-21 12:13:56 -05:00
Björn Pettersson
278fc8efdf
[DAGCombiner] Fix ReplaceAllUsesOfValueWith mutation bug in visitFREEZE (#104924)
In visitFREEZE we have been collecting a set/vector of
MaybePoisonOperands that later was iterated over, applying a freeze to
those operands. However, C-level fuzzy testing has discovered that the
recursiveness of ReplaceAllUsesOfValueWith may cause later operands in
the MaybePoisonOperands vector to be replaced when replacing an earlier
operand. That would then turn up as
   Assertion `N1.getOpcode() != ISD::DELETED_NODE &&
              "Operand is DELETED_NODE!"' failed.
failures when trying to freeze those later operands.

So we need to make sure that the vector with MaybePoisonOperands is
mutated as well when needed. Or as the solution used in this patch, make
sure to keep track of operand numbers that should be frozen instead of
having a vector of SDValues. And then we can refetch the operands while
iterating over operand numbers.

The problem was seen after adding SELECT_CC to the set of operations
including in "AllowMultipleMaybePoisonOperands". I'm not sure, but I
guess that this could happen for other operations as well for which we
allow multiple maybe poison operands.
2024-08-21 17:56:27 +02:00
Simon Pilgrim
8109e5de57
[DAG] Add select_cc -> abd folds (#102137)
Fixes #100810
2024-08-21 12:07:40 +01:00
Tianqing Wang
7f87b5bf0e
[SelectionDAG][X86] Preserve unpredictable metadata for conditional branches in SelectionDAG, as well as JCCs generated by X86 backend. (#102101)
This builds on 09515f2c2, which preserves unpredictable metadata in
CodeGen for `select`. This patch does it for conditional branches.
2024-08-19 11:04:48 +08:00
Craig Topper
067f2e9f18 [SelectionDAG] Use getSignedConstant/getAllOnesConstant. 2024-08-17 00:04:01 -07:00
Craig Topper
321de07b77
[DAGCombiner] Remove TRUNCATE_(S/U)SAT_(S/U) from an assert that isn't tested. NFC (#104466) 2024-08-16 08:42:55 -07:00
Craig Topper
e027e04f01
[DAGCombiner] Don't let scalarizeBinOpOfSplats create illegal scalar MULHS/MULHU (#104518)
Type legalization lacks generic support for these operations. They are
normally only created when the type is legal. This scalarization case is
new.

We could update type legalization, but there some corner cases that make
it not straightforward. For example, if the promoted type isn't 2x the
narrow type we need to over promote.

Fixes #104480
2024-08-15 21:07:22 -07:00
YunQiang Su
fb9e685fc4
Intrinsic: introduce minimumnum and maximumnum for IR and SelectionDAG (#96649)
C23 introduced new functions fminimum_num and fmaximum_num, and they
follow the minimumNumber and maximumNumber of IEEE754-2019. Let's
introduce new intrinsics to support them.

This patch introduces support only support for scalar values. The
support of
  vector (vp, vp.reduce, vector.reduce),
  experimental.constrained
will be added in future patches.

With this patch, MIPSr6 and LoongArch can work out of box with
fcanonical and fmax/fmin.

Aarch64/PowerPC64 can use the same login as MIPSr6 and LoongArch, while
they have no fcanonical support yet.
I will add it in future patches.

The FMIN/FMAX of RISC-V instructions follows the
minimumNumber/maximumNumber of IEEE754-2019. We can just add it in
future patch.

Background

https://discourse.llvm.org/t/rfc-fix-llvm-min-f-and-llvm-max-f-intrinsics/79735
Currently we have fminnum/fmaxnum, which have different behavior on
different platform for NUM vs sNaN:
   1) Fallback to fmin(3)/fmax(3): return qNaN.
   2) ARM64/ARM32+Neon: same as libc.
   3) MIPSr6/LoongArch/RISC-V: return NUM.

And the fix of fminnum/fmaxnum to follow minNUM/maxNUM of IEEE754-2008
will submit as separated patches.
2024-08-15 14:09:36 +08:00
Froster
234cb4c6e3
[SelectionDAG] Scalarize binary ops of splats before legal types (#100749)
Fixes #65072. This allows binary ops of splats to be scalarized if the
operation isn't legal on the element type isn't legal, but is legal on
the type it will be legalized to. I assume if an Op is legal both in
scalar and vector, choose scalar version should always be better no
matter what the type is.

There are some cases that my approach can't scalarize, for example:
``` llvm
; test/CodeGen/RISCV/rvv/select-int.ll
define <vscale x 4 x i64> @select_nxv4i64(i1 zeroext %c, <vscale x 4 x i64> %a, <vscale x 4 x i64> %b) {
  %v = select i1 %c, <vscale x 4 x i64> %a, <vscale x 4 x i64> %b
  ret <vscale x 4 x i64> %v
}
```
https://godbolt.org/z/xzqrKrxvK
`xor (splat i1, splat i1)` is generated in late step after LegalizeType,
from select. I didn't figure out how to make `xor i1, i1` legal at this
time.

---------

Co-authored-by: Luke Lau <luke@igalia.com>
2024-08-15 00:07:00 +08:00
Kazu Hirata
5ce326ccb1
[SelectionDAG] Construct SmallVector with ArrayRef (NFC) (#103705) 2024-08-14 08:22:20 -07:00
hanbeom
0d074ba197
[DAG] Support saturated truncate (#99418)
A truncate is considered saturated if no additional conversion is required between the target and return values. If the target is saturated when attempting to truncate from a vector, there is an opportunity to optimize it.

Previously, each architecture had its own attempt at optimization, leading to redundant code. This patch implements common logic by introducing three new ISDs:

`ISD::TRUNCATE_SSAT_S`: When the operand is a signed value and  the range of values matches the range of signed values of the  destination type.

`ISD::TRUNCATE_SSAT_U`: When the operand is a signed value and the range of values matches the range of unsigned values of the destination type.

`ISD::TRUNCATE_USAT_U`: When the operand is an unsigned value and the range of values matches the range of unsigned values of the destination type.

These ISDs indicate a saturated truncate.

Fixes https://github.com/llvm/llvm-project/issues/85903
2024-08-14 09:52:36 +01:00
Craig Topper
51bad732dc [SelectionDAG] Replace EVTToAPFloatSemantics with MVT/EVT::getFltSemantics. (#103001) 2024-08-13 11:35:28 -07:00
Pierre van Houtryve
7389545d0d
Reapply "[AMDGPU] Always lower s/udiv64 by constant to MUL" (#101942)
Reland #100723, fixing the ARM issue at the cost of a small loss of optimization in `test/CodeGen/AMDGPU/fshr.ll`

Solves #100383
2024-08-12 09:00:22 +02:00
Kazu Hirata
f4fb735840
[llvm] Construct SmallVector<SDValue> with ArrayRef (NFC) (#102578) 2024-08-09 09:15:42 -07:00
Simon Pilgrim
ad00e8a8dd [DAG] Replace m_SpecificInt(1) -> m_One()
For SDPatternMatch there's no difference in undef/poison vector element handling - in fact m_One() just wraps m_SpecificInt(1)
2024-08-08 18:20:46 +01:00
Simon Pilgrim
13d04fa560 [DAG] Add legalization handling for ABDS/ABDU (#92576) (REAPPLIED)
Always match ABD patterns pre-legalization, and use TargetLowering::expandABD to expand again during legalization.

abdu(lhs, rhs) -> sub(xor(sub(lhs, rhs), usub_overflow(lhs, rhs)), usub_overflow(lhs, rhs))
Alive2: https://alive2.llvm.org/ce/z/dVdMyv

REAPPLIED: Fix regression issue with "abs(ext(x) - ext(y)) -> zext(abd(x, y))" fold failing after type legalization
2024-08-08 11:39:05 +01:00
Simon Pilgrim
e4e96b3e26 Revert b1234ddbe2652aa7948242a57107ca7ab12fd2f8. "[DAG] Add legalization handling for ABDS/ABDU (#92576)"
Reverting #92576 while we identify a reported regression
2024-08-07 17:11:25 +01:00
Simon Pilgrim
6e60d549d4 [DAG] Add foldSelectToABD helper. NFC.
Pull out of visitVSELECT to allow reuse in the future.
2024-08-06 13:31:53 +01:00
Simon Pilgrim
b1234ddbe2
[DAG] Add legalization handling for ABDS/ABDU (#92576)
Always match ABD patterns pre-legalization, and use TargetLowering::expandABD to expand again during legalization.

abdu(lhs, rhs) -> sub(xor(sub(lhs, rhs), usub_overflow(lhs, rhs)), usub_overflow(lhs, rhs))
Alive2: https://alive2.llvm.org/ce/z/dVdMyv
2024-08-06 10:18:06 +01:00
Luke Lau
33fc322696
[SelectionDAG] Simplify vselect true, T, F -> T (#100992)
This addresses a TODO where we can fold a vselect to it's true operand
if the boolean is known to be all trues, by factoring out the logic from
extractBooleanFlip which checks TLI.getBooleanContents.
2024-08-06 10:49:20 +08:00
Kazu Hirata
8d1b17b662
[CodeGen] Construct SmallVector with ArrayRef (NFC) (#101841) 2024-08-04 00:41:29 -07:00
Michael Maitland
22ce33304e Revert "[DAG][NFC] Use SDPatternMatch for VScale in some instances"
This reverts commit d2304427cb0270259bc083a3db27413823f56e59.

The m_Add and m_Mul are commutative but the code does not expect the
communtativity.
2024-07-31 04:55:13 -07:00
Michael Maitland
d2304427cb [DAG][NFC] Use SDPatternMatch for VScale in some instances 2024-07-29 06:50:27 -07:00
Matt Davis
404071b059
[SelectionDAG] Preserve volatile undef stores. (#99918)
This patch preserves `undef` SDNodes that are `volatile` qualified.
Previously, these nodes would be discarded. The motivation behind this
change is to adhere to the
[LangRef](https://llvm.org/docs/LangRef.html#volatile-memory-accesses),
even though that doc is mostly in terms of LLVM-IR, it seems reasonable
to imply that the volatile constraints also imply to SDNodes.

> Certain memory accesses, such as
[load](https://llvm.org/docs/LangRef.html#i-load)’s,
[store](https://llvm.org/docs/LangRef.html#i-store)’s, and
[llvm.memcpy](https://llvm.org/docs/LangRef.html#int-memcpy)’s may be
marked volatile. The optimizers must not change the number of volatile
operations or change their order of execution relative to other volatile
operations. The optimizers may change the order of volatile operations
relative to non-volatile operations. This is not Java’s “volatile” and
has no cross-thread synchronization behavior.

Source: https://llvm.org/docs/LangRef.html#volatile-memory-accesses
2024-07-24 08:41:56 -04:00
David Green
b42fe6740e
[DAG] Add users of operand of simplified extract_vector_elt to worklist (#100074)
This helps to ensure we revisit the last extract_element uses of a node
so that it can be optimized away in cases such as extract(insert(scalartovec(x), 1), 0).
2024-07-23 16:34:09 +01:00
Björn Pettersson
2b78303e3f
[DAGCombiner] Freeze maybe poison operands when folding select to logic (#84924)
Just like for regular IR we need to treat SELECT as conditionally
blocking poison in SelectionDAG. So (unless the condition itself is
poison) the result is only poison if the selected true/false value is
poison.
Thus, when doing DAG combines that turn SELECT into arithmetic/logical
operations (e.g. AND/OR) we need to make sure that the new operations
aren't more poisonous. One way to do that is to use FREEZE to make
sure the operands aren't posion.

This patch aims at fixing the kind of miscompiles reported in
  https://github.com/llvm/llvm-project/issues/84653
and
  https://github.com/llvm/llvm-project/issues/85190

Solution is to make sure that we insert FREEZE, if needed to make
the fold sound, when using the foldBoolSelectToLogic and
foldVSelectToSignBitSplatMask DAG combines.
2024-07-22 17:19:46 +02:00
Bjorn Pettersson
8ebe7e60f5 [DAGCombiner] Push freeze through SETCC and SELECT_CC (#64718)
Allow pushing freeze through SETCC and SELECT_CC even if there are
multiple "maybe poison" operands. In the past we have limited it to
a single "maybe poison" operand, but it seems profitable to also
allow the multiple operand scenario.

One goal here is to avoid some regressions seen in review of
  https://github.com/llvm/llvm-project/pull/84924
when solving the select->and miscompiles described in
  https://github.com/llvm/llvm-project/issues/84653
2024-07-22 16:01:59 +02:00
Simon Pilgrim
f406d83d95 [DAG] widenCtPop - reuse existing SDLoc. NFC. 2024-07-22 11:24:23 +01:00
Joseph Huber
615b7eeaa9 Reapply "[LLVM][LTO] Factor out RTLib calls and allow them to be dropped (#98512)"
This reverts commit 740161a9b98c9920dedf1852b5f1c94d0a683af5.

I moved the `ISD` dependencies into the CodeGen portion of the handling,
it's a little awkward but it's the easiest solution I can think of for
now.
2024-07-20 09:29:31 -05:00
NAKAMURA Takumi
740161a9b9 Revert "[LLVM][LTO] Factor out RTLib calls and allow them to be dropped (#98512)"
This reverts commit c05126bdfc3b02daa37d11056fa43db1a6cdef69.
(llvmorg-19-init-17714-gc05126bdfc3b)
See #99610
2024-07-20 12:36:57 +09:00
Simon Pilgrim
497ea1d849 [DAG] tryToFoldExtendSelectLoad - reuse existing SDLoc. NFC. 2024-07-18 16:19:15 +01:00
Lawrence Benson
177ce1900f
[LLVM] Add llvm.experimental.vector.compress intrinsic (#92289)
This PR adds a new vector intrinsic `@llvm.experimental.vector.compress`
to "compress" data within a vector based on a selection mask, i.e., it
moves all selected values (i.e., where `mask[i] == 1`) to consecutive
lanes in the result vector. A `passthru` vector can be provided, from
which remaining lanes are filled.

The main reason for this is that the existing
`@llvm.masked.compressstore` has very strong constraints in that it can
only write values that were selected, resulting in guard branches for
all targets except AVX-512 (and even there the AMD implementation is
_very_ slow). More instruction sets support "compress" logic, but only
within registers. So to store the values, an additional store is needed.
But this combination is likely significantly faster on many target as it
avoids branches.

In follow up PRs, my plan is to add target-specific lowerings for x86,
SVE, and possibly RISCV. I also want to combine this with a store
instruction, as this is probably a common case and we can avoid some
memory writes in that case.

See [discussion in
forum](https://discourse.llvm.org/t/new-intrinsic-for-masked-vector-compress-without-store/78663)
for initial discussion on the design.
2024-07-17 14:24:24 +02:00
Joseph Huber
c05126bdfc
[LLVM][LTO] Factor out RTLib calls and allow them to be dropped (#98512)
Summary:
The LTO pass and LLD linker have logic in them that forces extraction
and prevent internalization of needed runtime calls. However, these
currently take all RTLibcalls into account, even if the target does not
support them. The target opts-out of a libcall if it sets its name to
nullptr. This patch pulls this logic out into a class in the header so
that LTO / lld can use it to determine if a symbol actually needs to be
kept.

This is important for targets like AMDGPU that want to be able to use
`lld` to perform the final link step, but does not want the overhead of
uncalled functions. (This adds like a second to the link time trivially)
2024-07-16 06:22:09 -05:00
Simon Pilgrim
290537238b [X86] visitADDLike - pull out repeated SDLoc. NFC. 2024-07-15 17:20:58 +01:00
Simon Pilgrim
3560e1d0ce [DAG] visitADDLike - convert (A-B)+(C-D) --> (A+C)-(B+D) fold to sd_match. NFC. 2024-07-15 17:20:58 +01:00
Simon Pilgrim
ba8792b667 [X86] visitFCOPYSIGN - pull out repeated SDLoc. NFC. 2024-07-15 16:34:21 +01:00
Simon Pilgrim
61a4e1e70f
[DAG] Add SDPatternMatch::m_SetCC and update some combines to use it (#98646)
The plan is to add more TernaryOp in the future (SELECT/VSELECT and FMA in particular)
2024-07-14 17:18:43 +01:00
Kazu Hirata
66cd2e0f9a
[CodeGen] Use range-based for loops (NFC) (#98706) 2024-07-13 13:29:47 -07:00
AtariDreams
4f8b2fff6d
[DAG] Use break instead of continue to leave do while (false) loop (NFC) (#97966) 2024-07-10 20:51:06 +04:00
Craig Topper
33112cbf59 [DAGCombiner] Remove unnecessary assert from getShiftAmountTy wrapper. NFC
The same assert appears in the TargetLowering function.

Refine comment to describe as a convenience wrapper and leave it to
TargetLowering documentation to explain.
2024-07-04 19:05:54 -07:00
Craig Topper
8419da8bd4
[SelectionDAG] Remove LegalTypes argument from getShiftAmountConstant. (#97653)
#97645 proposed to remove LegalTypes from getShiftAmountTy. This patches
removes it from getShiftAmountConstant which is one of the callers of
getShiftAmountTy.
2024-07-04 18:33:25 -07:00
Craig Topper
3141c11fe8
[SelectionDAG] Remove LegalTypes argument from getShiftAmountTy. NFC (#97757)
This argument is no longer used inside the function. Remove it from the
interface.
2024-07-04 15:24:54 -07:00
Craig Topper
34fe032fdb
[DAGCombiner] Use getShiftAmountConstant where possible. (#97683)
In #97645, I proposed removing the LegalTypes operand to
TargetLowering::getShiftAmountTy. This means we don't need to use the
DAGCombiner wrapper for getShiftAmountTy that manages this flag. Now we
can use getShiftAmountConstant and let it call
TargetLowering::getShiftAmountTy.
2024-07-04 08:44:50 -07:00
Craig Topper
a3c5c83273 [DAGCombiner] Remove unneeded getValueType() calls in visitMULHS/MULHU. NFC
We have an existing VT variable that should match N0.getValueType.
2024-07-03 13:35:04 -07:00
Simon Pilgrim
163d00c666 [DAG] Pull out repeated SDLoc in SELECT/SETCC folds. NFC. 2024-07-01 18:03:46 +01:00
Matt Arsenault
8eee6d33f7
DAG: Call SimplifyDemandedBits on copysign value operand (#97180)
So far the only cases that seem to benefit are the weird
copysign with different typed inputs.
2024-07-01 12:29:11 +02:00
Matt Arsenault
db9252b115
DAG: Call SimplifyDemandedBits on fcopysign sign value (#97151)
Math library code has quite a few places with complex bit
logic that are ultimately fed into a copysign. This helps
avoid some regressions in a future patch.

This assumes the position in the float type, which should
at least be valid for IEEE types. Not sure if we need to guard
against ppc_fp128 or anything else weird.

There appears to be some value in simplifying the value operand
as well, but I'll address that separately.
2024-07-01 12:19:17 +02:00
Nikita Popov
f2f18459d4 Revert "Intrinsic: introduce minimumnum and maximumnum (#93841)"
As far as I can tell, this pull request was not approved, and
did not go through an RFC on discourse.

This reverts commit 89881480030f48f83af668175b70a9798edca2fb.
This reverts commit 225d8fc8eb24fb797154c1ef6dcbe5ba033142da.
2024-06-21 08:34:04 +02:00