3891 Commits

Author SHA1 Message Date
David Green
2242cd2b6a
[DAG] Fold vecreduce.or(sext(x)) to sext(vecreduce.or(x)) (#108959)
The same is true for and / xor reductions, where the sext / zext can be
sank down through the bitwise operation.
https://alive2.llvm.org/ce/z/TvzCd5
2024-09-17 15:24:00 +01:00
Matt Arsenault
c49a1ae6d6 DAG: Reorder isFMAFasterThanFMulAndFAdd checks (NFC)
Basic legality checks should be first.
2024-09-15 16:33:01 +04:00
Robert Dazi
8837898b8d
[DAGCombine] Count leading ones: refine post DAG/Type Legalisation if promotion (#102877)
This PR is related to #99591. In this PR, instead of modifying how the
legalisation occurs depending on surrounding instructions, we refine
after legalisation.

This PR has two parts:

* `SDPatternMatch/MatchContext`: Modify a little bit the code to match
Operands (used by `m_Node(...)`) and Unary/Binary/Ternary Patterns to
make it compatible with `VPMatchContext`, instead of only `m_Opc`
supported. Some tests were added to ensure no regressions.
* `DAGCombiner`: Add a `foldSubCtlzNot` which detect and rewrite the
patterns using matching context.

Remaining Tasks:

- [ ] GlobalISel
- [ ] Currently the pattern matching will occur even before
legalisation. Should I restrict it to specific stages instead ?
- [ ] Style: Add a visitVP_SUB ?? Move `foldSubCtlzNot` in another
location for style consistency purpose ?

@topperc

---------

Co-authored-by: v01dxyz <v01dxyz@v01d.xyz>
2024-09-15 15:48:36 +04:00
Simon Pilgrim
5910e8d607 [DAG] visitUDIV - call SimplifyDemandedBits to handle hidden constant foldable cases
Fixes #108728
2024-09-15 12:29:28 +01:00
Simon Pilgrim
69a21154ca
[DAG] Fold trunc(srl(extract_elt(vec,c1),c2)) -> extract_elt(bitcast(vec),c3) (#107987)
Extends existing trunc(extract_elt(vec,c1)) -> extract_elt(bitcast(vec),c3) fold.

Noticed while working on #107404
2024-09-13 15:13:58 +01:00
Simon Pilgrim
6ec889e53f [DAG] Add support for neg(abd(x,y)) patterns.
Currently limited to cases which have legal/custom ABDS/ABDU handling - I'll extend this for all targets in future (similar to how we support neg(abs(x))) once I've addressed some outstanding regressions on aarch64/riscv.

Helps avoid a lot of extra cmov instructions on x86 in particular, and allows us to more easily improve the codegen in future commits.
2024-09-06 13:16:09 +01:00
Princeton Ferro
8f77d37f25
[DAGCombiner] cache negative result from getMergeStoreCandidates() (#106949)
Cache negative search result from getStoreMergeCandidates() so that
mergeConsecutiveStores() does not iterate quadratically over a
potentially long sequence of unmergeable stores.
2024-09-04 18:18:53 +04:00
Simon Pilgrim
b25b9a7d6c [DAG] visitSELECT - add "select usubo(x, y).overflow, (sub y, x), (usubo x, y) -> abdu(x, y)" fold (and neg equivalent)
Handle cases where CGP has merged the CMP+SUB into a USUBO node - improves a few outstanding niggles from #100810
2024-09-04 11:59:10 +01:00
Simon Pilgrim
4baf29e81e [DAG] Handle cases where a shift amount is larger than the pre-extended value bitwidth
In the (zext (shl (zext x), cst)) -> (shl (zext x), cst) fold, don't use a bitmask / MaskedValueIsZero as we can't guarantee that the shift amount is in bounds.

Fixes #106202
2024-08-27 18:12:24 +01:00
Simon Pilgrim
807557654a [DAG] visitTRUNCATE_USAT_U - use sd_match to match FP_TO_UINT_SAT pattern. NFC. 2024-08-23 16:39:32 +01:00
Sumanth Gundapaneni
e78156a0e2
Scalarize the vector inputs to llvm.lround intrinsic by default. (#101054)
Verifier is updated in a different patch to let the vector types for
llvm.lround and llvm.llround intrinsics.
2024-08-21 12:13:56 -05:00
Björn Pettersson
278fc8efdf
[DAGCombiner] Fix ReplaceAllUsesOfValueWith mutation bug in visitFREEZE (#104924)
In visitFREEZE we have been collecting a set/vector of
MaybePoisonOperands that later was iterated over, applying a freeze to
those operands. However, C-level fuzzy testing has discovered that the
recursiveness of ReplaceAllUsesOfValueWith may cause later operands in
the MaybePoisonOperands vector to be replaced when replacing an earlier
operand. That would then turn up as
   Assertion `N1.getOpcode() != ISD::DELETED_NODE &&
              "Operand is DELETED_NODE!"' failed.
failures when trying to freeze those later operands.

So we need to make sure that the vector with MaybePoisonOperands is
mutated as well when needed. Or as the solution used in this patch, make
sure to keep track of operand numbers that should be frozen instead of
having a vector of SDValues. And then we can refetch the operands while
iterating over operand numbers.

The problem was seen after adding SELECT_CC to the set of operations
including in "AllowMultipleMaybePoisonOperands". I'm not sure, but I
guess that this could happen for other operations as well for which we
allow multiple maybe poison operands.
2024-08-21 17:56:27 +02:00
Simon Pilgrim
8109e5de57
[DAG] Add select_cc -> abd folds (#102137)
Fixes #100810
2024-08-21 12:07:40 +01:00
Tianqing Wang
7f87b5bf0e
[SelectionDAG][X86] Preserve unpredictable metadata for conditional branches in SelectionDAG, as well as JCCs generated by X86 backend. (#102101)
This builds on 09515f2c2, which preserves unpredictable metadata in
CodeGen for `select`. This patch does it for conditional branches.
2024-08-19 11:04:48 +08:00
Craig Topper
067f2e9f18 [SelectionDAG] Use getSignedConstant/getAllOnesConstant. 2024-08-17 00:04:01 -07:00
Craig Topper
321de07b77
[DAGCombiner] Remove TRUNCATE_(S/U)SAT_(S/U) from an assert that isn't tested. NFC (#104466) 2024-08-16 08:42:55 -07:00
Craig Topper
e027e04f01
[DAGCombiner] Don't let scalarizeBinOpOfSplats create illegal scalar MULHS/MULHU (#104518)
Type legalization lacks generic support for these operations. They are
normally only created when the type is legal. This scalarization case is
new.

We could update type legalization, but there some corner cases that make
it not straightforward. For example, if the promoted type isn't 2x the
narrow type we need to over promote.

Fixes #104480
2024-08-15 21:07:22 -07:00
YunQiang Su
fb9e685fc4
Intrinsic: introduce minimumnum and maximumnum for IR and SelectionDAG (#96649)
C23 introduced new functions fminimum_num and fmaximum_num, and they
follow the minimumNumber and maximumNumber of IEEE754-2019. Let's
introduce new intrinsics to support them.

This patch introduces support only support for scalar values. The
support of
  vector (vp, vp.reduce, vector.reduce),
  experimental.constrained
will be added in future patches.

With this patch, MIPSr6 and LoongArch can work out of box with
fcanonical and fmax/fmin.

Aarch64/PowerPC64 can use the same login as MIPSr6 and LoongArch, while
they have no fcanonical support yet.
I will add it in future patches.

The FMIN/FMAX of RISC-V instructions follows the
minimumNumber/maximumNumber of IEEE754-2019. We can just add it in
future patch.

Background

https://discourse.llvm.org/t/rfc-fix-llvm-min-f-and-llvm-max-f-intrinsics/79735
Currently we have fminnum/fmaxnum, which have different behavior on
different platform for NUM vs sNaN:
   1) Fallback to fmin(3)/fmax(3): return qNaN.
   2) ARM64/ARM32+Neon: same as libc.
   3) MIPSr6/LoongArch/RISC-V: return NUM.

And the fix of fminnum/fmaxnum to follow minNUM/maxNUM of IEEE754-2008
will submit as separated patches.
2024-08-15 14:09:36 +08:00
Froster
234cb4c6e3
[SelectionDAG] Scalarize binary ops of splats before legal types (#100749)
Fixes #65072. This allows binary ops of splats to be scalarized if the
operation isn't legal on the element type isn't legal, but is legal on
the type it will be legalized to. I assume if an Op is legal both in
scalar and vector, choose scalar version should always be better no
matter what the type is.

There are some cases that my approach can't scalarize, for example:
``` llvm
; test/CodeGen/RISCV/rvv/select-int.ll
define <vscale x 4 x i64> @select_nxv4i64(i1 zeroext %c, <vscale x 4 x i64> %a, <vscale x 4 x i64> %b) {
  %v = select i1 %c, <vscale x 4 x i64> %a, <vscale x 4 x i64> %b
  ret <vscale x 4 x i64> %v
}
```
https://godbolt.org/z/xzqrKrxvK
`xor (splat i1, splat i1)` is generated in late step after LegalizeType,
from select. I didn't figure out how to make `xor i1, i1` legal at this
time.

---------

Co-authored-by: Luke Lau <luke@igalia.com>
2024-08-15 00:07:00 +08:00
Kazu Hirata
5ce326ccb1
[SelectionDAG] Construct SmallVector with ArrayRef (NFC) (#103705) 2024-08-14 08:22:20 -07:00
hanbeom
0d074ba197
[DAG] Support saturated truncate (#99418)
A truncate is considered saturated if no additional conversion is required between the target and return values. If the target is saturated when attempting to truncate from a vector, there is an opportunity to optimize it.

Previously, each architecture had its own attempt at optimization, leading to redundant code. This patch implements common logic by introducing three new ISDs:

`ISD::TRUNCATE_SSAT_S`: When the operand is a signed value and  the range of values matches the range of signed values of the  destination type.

`ISD::TRUNCATE_SSAT_U`: When the operand is a signed value and the range of values matches the range of unsigned values of the destination type.

`ISD::TRUNCATE_USAT_U`: When the operand is an unsigned value and the range of values matches the range of unsigned values of the destination type.

These ISDs indicate a saturated truncate.

Fixes https://github.com/llvm/llvm-project/issues/85903
2024-08-14 09:52:36 +01:00
Craig Topper
51bad732dc [SelectionDAG] Replace EVTToAPFloatSemantics with MVT/EVT::getFltSemantics. (#103001) 2024-08-13 11:35:28 -07:00
Pierre van Houtryve
7389545d0d
Reapply "[AMDGPU] Always lower s/udiv64 by constant to MUL" (#101942)
Reland #100723, fixing the ARM issue at the cost of a small loss of optimization in `test/CodeGen/AMDGPU/fshr.ll`

Solves #100383
2024-08-12 09:00:22 +02:00
Kazu Hirata
f4fb735840
[llvm] Construct SmallVector<SDValue> with ArrayRef (NFC) (#102578) 2024-08-09 09:15:42 -07:00
Simon Pilgrim
ad00e8a8dd [DAG] Replace m_SpecificInt(1) -> m_One()
For SDPatternMatch there's no difference in undef/poison vector element handling - in fact m_One() just wraps m_SpecificInt(1)
2024-08-08 18:20:46 +01:00
Simon Pilgrim
13d04fa560 [DAG] Add legalization handling for ABDS/ABDU (#92576) (REAPPLIED)
Always match ABD patterns pre-legalization, and use TargetLowering::expandABD to expand again during legalization.

abdu(lhs, rhs) -> sub(xor(sub(lhs, rhs), usub_overflow(lhs, rhs)), usub_overflow(lhs, rhs))
Alive2: https://alive2.llvm.org/ce/z/dVdMyv

REAPPLIED: Fix regression issue with "abs(ext(x) - ext(y)) -> zext(abd(x, y))" fold failing after type legalization
2024-08-08 11:39:05 +01:00
Simon Pilgrim
e4e96b3e26 Revert b1234ddbe2652aa7948242a57107ca7ab12fd2f8. "[DAG] Add legalization handling for ABDS/ABDU (#92576)"
Reverting #92576 while we identify a reported regression
2024-08-07 17:11:25 +01:00
Simon Pilgrim
6e60d549d4 [DAG] Add foldSelectToABD helper. NFC.
Pull out of visitVSELECT to allow reuse in the future.
2024-08-06 13:31:53 +01:00
Simon Pilgrim
b1234ddbe2
[DAG] Add legalization handling for ABDS/ABDU (#92576)
Always match ABD patterns pre-legalization, and use TargetLowering::expandABD to expand again during legalization.

abdu(lhs, rhs) -> sub(xor(sub(lhs, rhs), usub_overflow(lhs, rhs)), usub_overflow(lhs, rhs))
Alive2: https://alive2.llvm.org/ce/z/dVdMyv
2024-08-06 10:18:06 +01:00
Luke Lau
33fc322696
[SelectionDAG] Simplify vselect true, T, F -> T (#100992)
This addresses a TODO where we can fold a vselect to it's true operand
if the boolean is known to be all trues, by factoring out the logic from
extractBooleanFlip which checks TLI.getBooleanContents.
2024-08-06 10:49:20 +08:00
Kazu Hirata
8d1b17b662
[CodeGen] Construct SmallVector with ArrayRef (NFC) (#101841) 2024-08-04 00:41:29 -07:00
Michael Maitland
22ce33304e Revert "[DAG][NFC] Use SDPatternMatch for VScale in some instances"
This reverts commit d2304427cb0270259bc083a3db27413823f56e59.

The m_Add and m_Mul are commutative but the code does not expect the
communtativity.
2024-07-31 04:55:13 -07:00
Michael Maitland
d2304427cb [DAG][NFC] Use SDPatternMatch for VScale in some instances 2024-07-29 06:50:27 -07:00
Matt Davis
404071b059
[SelectionDAG] Preserve volatile undef stores. (#99918)
This patch preserves `undef` SDNodes that are `volatile` qualified.
Previously, these nodes would be discarded. The motivation behind this
change is to adhere to the
[LangRef](https://llvm.org/docs/LangRef.html#volatile-memory-accesses),
even though that doc is mostly in terms of LLVM-IR, it seems reasonable
to imply that the volatile constraints also imply to SDNodes.

> Certain memory accesses, such as
[load](https://llvm.org/docs/LangRef.html#i-load)’s,
[store](https://llvm.org/docs/LangRef.html#i-store)’s, and
[llvm.memcpy](https://llvm.org/docs/LangRef.html#int-memcpy)’s may be
marked volatile. The optimizers must not change the number of volatile
operations or change their order of execution relative to other volatile
operations. The optimizers may change the order of volatile operations
relative to non-volatile operations. This is not Java’s “volatile” and
has no cross-thread synchronization behavior.

Source: https://llvm.org/docs/LangRef.html#volatile-memory-accesses
2024-07-24 08:41:56 -04:00
David Green
b42fe6740e
[DAG] Add users of operand of simplified extract_vector_elt to worklist (#100074)
This helps to ensure we revisit the last extract_element uses of a node
so that it can be optimized away in cases such as extract(insert(scalartovec(x), 1), 0).
2024-07-23 16:34:09 +01:00
Björn Pettersson
2b78303e3f
[DAGCombiner] Freeze maybe poison operands when folding select to logic (#84924)
Just like for regular IR we need to treat SELECT as conditionally
blocking poison in SelectionDAG. So (unless the condition itself is
poison) the result is only poison if the selected true/false value is
poison.
Thus, when doing DAG combines that turn SELECT into arithmetic/logical
operations (e.g. AND/OR) we need to make sure that the new operations
aren't more poisonous. One way to do that is to use FREEZE to make
sure the operands aren't posion.

This patch aims at fixing the kind of miscompiles reported in
  https://github.com/llvm/llvm-project/issues/84653
and
  https://github.com/llvm/llvm-project/issues/85190

Solution is to make sure that we insert FREEZE, if needed to make
the fold sound, when using the foldBoolSelectToLogic and
foldVSelectToSignBitSplatMask DAG combines.
2024-07-22 17:19:46 +02:00
Bjorn Pettersson
8ebe7e60f5 [DAGCombiner] Push freeze through SETCC and SELECT_CC (#64718)
Allow pushing freeze through SETCC and SELECT_CC even if there are
multiple "maybe poison" operands. In the past we have limited it to
a single "maybe poison" operand, but it seems profitable to also
allow the multiple operand scenario.

One goal here is to avoid some regressions seen in review of
  https://github.com/llvm/llvm-project/pull/84924
when solving the select->and miscompiles described in
  https://github.com/llvm/llvm-project/issues/84653
2024-07-22 16:01:59 +02:00
Simon Pilgrim
f406d83d95 [DAG] widenCtPop - reuse existing SDLoc. NFC. 2024-07-22 11:24:23 +01:00
Joseph Huber
615b7eeaa9 Reapply "[LLVM][LTO] Factor out RTLib calls and allow them to be dropped (#98512)"
This reverts commit 740161a9b98c9920dedf1852b5f1c94d0a683af5.

I moved the `ISD` dependencies into the CodeGen portion of the handling,
it's a little awkward but it's the easiest solution I can think of for
now.
2024-07-20 09:29:31 -05:00
NAKAMURA Takumi
740161a9b9 Revert "[LLVM][LTO] Factor out RTLib calls and allow them to be dropped (#98512)"
This reverts commit c05126bdfc3b02daa37d11056fa43db1a6cdef69.
(llvmorg-19-init-17714-gc05126bdfc3b)
See #99610
2024-07-20 12:36:57 +09:00
Simon Pilgrim
497ea1d849 [DAG] tryToFoldExtendSelectLoad - reuse existing SDLoc. NFC. 2024-07-18 16:19:15 +01:00
Lawrence Benson
177ce1900f
[LLVM] Add llvm.experimental.vector.compress intrinsic (#92289)
This PR adds a new vector intrinsic `@llvm.experimental.vector.compress`
to "compress" data within a vector based on a selection mask, i.e., it
moves all selected values (i.e., where `mask[i] == 1`) to consecutive
lanes in the result vector. A `passthru` vector can be provided, from
which remaining lanes are filled.

The main reason for this is that the existing
`@llvm.masked.compressstore` has very strong constraints in that it can
only write values that were selected, resulting in guard branches for
all targets except AVX-512 (and even there the AMD implementation is
_very_ slow). More instruction sets support "compress" logic, but only
within registers. So to store the values, an additional store is needed.
But this combination is likely significantly faster on many target as it
avoids branches.

In follow up PRs, my plan is to add target-specific lowerings for x86,
SVE, and possibly RISCV. I also want to combine this with a store
instruction, as this is probably a common case and we can avoid some
memory writes in that case.

See [discussion in
forum](https://discourse.llvm.org/t/new-intrinsic-for-masked-vector-compress-without-store/78663)
for initial discussion on the design.
2024-07-17 14:24:24 +02:00
Joseph Huber
c05126bdfc
[LLVM][LTO] Factor out RTLib calls and allow them to be dropped (#98512)
Summary:
The LTO pass and LLD linker have logic in them that forces extraction
and prevent internalization of needed runtime calls. However, these
currently take all RTLibcalls into account, even if the target does not
support them. The target opts-out of a libcall if it sets its name to
nullptr. This patch pulls this logic out into a class in the header so
that LTO / lld can use it to determine if a symbol actually needs to be
kept.

This is important for targets like AMDGPU that want to be able to use
`lld` to perform the final link step, but does not want the overhead of
uncalled functions. (This adds like a second to the link time trivially)
2024-07-16 06:22:09 -05:00
Simon Pilgrim
290537238b [X86] visitADDLike - pull out repeated SDLoc. NFC. 2024-07-15 17:20:58 +01:00
Simon Pilgrim
3560e1d0ce [DAG] visitADDLike - convert (A-B)+(C-D) --> (A+C)-(B+D) fold to sd_match. NFC. 2024-07-15 17:20:58 +01:00
Simon Pilgrim
ba8792b667 [X86] visitFCOPYSIGN - pull out repeated SDLoc. NFC. 2024-07-15 16:34:21 +01:00
Simon Pilgrim
61a4e1e70f
[DAG] Add SDPatternMatch::m_SetCC and update some combines to use it (#98646)
The plan is to add more TernaryOp in the future (SELECT/VSELECT and FMA in particular)
2024-07-14 17:18:43 +01:00
Kazu Hirata
66cd2e0f9a
[CodeGen] Use range-based for loops (NFC) (#98706) 2024-07-13 13:29:47 -07:00
AtariDreams
4f8b2fff6d
[DAG] Use break instead of continue to leave do while (false) loop (NFC) (#97966) 2024-07-10 20:51:06 +04:00
Craig Topper
33112cbf59 [DAGCombiner] Remove unnecessary assert from getShiftAmountTy wrapper. NFC
The same assert appears in the TargetLowering function.

Refine comment to describe as a convenience wrapper and leave it to
TargetLowering documentation to explain.
2024-07-04 19:05:54 -07:00