3921 Commits

Author SHA1 Message Date
Simon Pilgrim
056cf936a7
[DAG] Fold (and X, (bswap/bitreverse (not Y))) -> (and X, (not (bswap/bitreverse Y))) (#112547)
On ANDNOT capable targets we can always do this profitably, without ANDNOT we only attempt this if we don't introduce an additional NOT

Fixes #112425
2024-10-28 11:52:44 +00:00
James Chesterman
11c818816d
[AArch64] Improve index selection for histograms (#111150)
Removes unnecessary extends on the indices passed into histogram instructions. It also removes the instruction when the mask is zero.
2024-10-22 11:14:00 +01:00
Simon Pilgrim
f0b3b6d15b [DAG] isConstantIntBuildVectorOrConstantInt - peek through bitcasts (#112710) (REAPPLIED)
Alter both isConstantIntBuildVectorOrConstantInt + isConstantFPBuildVectorOrConstantFP to return a bool instead of the underlying SDNode, and adjust usage to account for this.

Update isConstantIntBuildVectorOrConstantInt to peek though bitcasts when attempting to find a constant, in particular this improves canonicalization of constants to the RHS on commutable instructions.

X86 is the beneficiary here as it often bitcasts rematerializable 0/-1 vector constants as vXi32 and bitcasts to the requested type

Minor cleanup that helps with #107423

Reapplied after regression fix ba1255def64a9c3c68d97ace051eec76f546eeb0
2024-10-20 14:23:21 +01:00
Simon Pilgrim
ba1255def6 [DAG] Use FoldConstantArithmetic to constant fold (and (ext (and V, c1)), c2) -> (and (ext V), (and c1, (ext c2)))
Noticed while triaging the regression from #112710 noticed by @mstorsjo - don't rely on isConstantIntBuildVectorOrConstantInt+getNode to guarantee constant folding (if it fails to constant fold it will infinite loop), use FoldConstantArithmetic instead.
2024-10-20 13:05:23 +01:00
Martin Storsjö
b26df3e463 Revert "[DAG] isConstantIntBuildVectorOrConstantInt - peek through bitcasts (#112710)"
This reverts commit a630771b28f4b252e2754776b8f3ab416133951a.

This caused compilation to hang for Windows/ARM, see
https://github.com/llvm/llvm-project/pull/112710 for details.
2024-10-20 00:49:16 +03:00
Simon Pilgrim
93ec08d629 [DAG] Move SIGN_EXTEND_INREG constant folding inside FoldConstantArithmetic
Update visitSIGN_EXTEND_INREG to call FoldConstantArithmetic instead of getNode.
2024-10-19 20:57:07 +01:00
Simon Pilgrim
e1330d96a0 [DAG] visitFMA/FDIV - avoid SDLoc duplication. NFC. 2024-10-18 11:57:41 +01:00
Simon Pilgrim
5c37316b54 [DAG] visitFMA/FMAD - use FoldConstantArithmetic to add missing vector constant folding support 2024-10-18 11:12:06 +01:00
Simon Pilgrim
a630771b28
[DAG] isConstantIntBuildVectorOrConstantInt - peek through bitcasts (#112710)
Alter both isConstantIntBuildVectorOrConstantInt + isConstantFPBuildVectorOrConstantFP to return a bool instead of the underlying SDNode, and adjust usage to account for this.

Update isConstantIntBuildVectorOrConstantInt to peek though bitcasts when attempting to find a constant, in particular this improves canonicalization of constants to the RHS on commutable instructions.

X86 is the beneficiary here as it often bitcasts rematerializable 0/-1 vector constants as vXi32 and bitcasts to the requested type

Minor cleanup that helps with #107423
2024-10-18 10:52:55 +01:00
Simon Pilgrim
3ec1b1a4dd [DAG] visitFP_EXTEND - use FoldConstantArithmetic to attempt to constant fold
Don't rely on isConstantFPBuildVectorOrConstantFP followed by getNode() will constant fold - FoldConstantArithmetic will do all of this for us.
2024-10-18 10:10:44 +01:00
Simon Pilgrim
3a1df05ca9 [DAG] visitFP_ROUND - use FoldConstantArithmetic to attempt to constant fold
Don't rely on isConstantFPBuildVectorOrConstantFP followed by getNode() will constant fold - FoldConstantArithmetic will do all of this for us.
2024-10-18 10:10:43 +01:00
Simon Pilgrim
7a43be1690 [DAG] visitXROUND - use FoldConstantArithmetic to attempt to constant fold
Don't rely on isConstantFPBuildVectorOrConstantFP followed by getNode() will constant fold - FoldConstantArithmetic will do all of this for us.
2024-10-18 10:10:43 +01:00
Simon Pilgrim
c72992bf89 [DAG] visitABS - use FoldConstantArithmetic to attempt to constant fold
Don't rely on isConstantFPBuildVectorOrConstantFP followed by getNode() will constant fold - FoldConstantArithmetic will do all of this for us.

Cleanup for #112682
2024-10-18 10:10:43 +01:00
Simon Pilgrim
256bbdb3f6 [DAG] visitFCEIL/FTRUNC/FFLOOR/FNEG - use FoldConstantArithmetic to attempt to constant fold
Don't rely on isConstantFPBuildVectorOrConstantFP followed by getNode() will constant fold - FoldConstantArithmetic will do all of this for us.

Cleanup for #112682
2024-10-17 16:53:44 +01:00
Simon Pilgrim
cf046c8717 [DAG] visitSIGN_EXTEND_INREG - avoid SDLoc duplication. NFC. 2024-10-17 12:51:11 +01:00
Simon Pilgrim
5692a0c6f8 [DAG] visitFP_TO_SINT/FP_TO_UINT - use FoldConstantArithmetic to attempt to constant fold
Don't rely on isConstantFPBuildVectorOrConstantFP followed by getNode() will constant fold - FoldConstantArithmetic will do all of this for us.

Cleanup for #112682
2024-10-17 12:50:09 +01:00
Simon Pilgrim
784c15a282 [DAG] visitSINT_TO_FP/UINT_TO_FP - use FoldConstantArithmetic to attempt to constant fold
Don't rely on isConstantIntBuildVectorOrConstantInt followed by getNode() will constant fold - FoldConstantArithmetic will do all of this for us.

Cleanup for #112682
2024-10-17 12:50:09 +01:00
Simon Pilgrim
8268bc48eb [DAG] Avoid SDLoc duplication in FP<->INT combines. NFC. 2024-10-17 12:50:09 +01:00
Lewis Crawford
f5f00764ab
[DAGCombiner] Fix check for extending loads (#112182)
Fix a check for extending loads in DAGCombiner,
where if the result type has more bits than the
loaded type it should count as an extending load.

All backends apart from AArch64 ignore this
ExtTy argument to shouldReduceLoadWidth, so this
change currently only impacts AArch64.
2024-10-16 13:23:46 +01:00
Simon Pilgrim
25b702f263 [DAG] visitXOR - add missing comment for or/and constant setcc demorgan fold. NFC.
Noticed while triaging #112347 which is using this fold - we described the or->and fold, but not the equivalent and->or which is also handled.
2024-10-16 11:15:36 +01:00
Simon Pilgrim
30deb76d46 [DAG] visitXOR - add missing comment for or/and constant demorgan fold. NFC.
Noticed while triaging #112347 which is using this fold - we described the or->and fold, but not the equivalent and->or which is also handled.
2024-10-15 16:32:27 +01:00
c8ef
854ded9b24
Reapply "[DAG] Enhance SDPatternMatch to match integer minimum and maximum patterns in addition to the existing ISD nodes." (#112203)
This patch adds icmp+select patterns for integer min/max matchers in
SDPatternMatch, similar to those in IR PatternMatch.

Reapply #111774.

Closes #108218.
2024-10-15 21:07:06 +08:00
c8ef
a3b0c31ebc
Revert "[DAG] Enhance SDPatternMatch to match integer minimum and maximum patterns in addition to the existing ISD nodes." (#112200)
Reverts llvm/llvm-project#111774

This appears to be causing some tests to fail.
2024-10-14 21:43:49 +08:00
c8ef
11f625cb87
[DAG] Enhance SDPatternMatch to match integer minimum and maximum patterns in addition to the existing ISD nodes. (#111774)
Closes #108218.

This patch adds icmp+select patterns for integer min/max matchers in
SDPatternMatch, similar to those in IR PatternMatch.
2024-10-14 21:19:34 +08:00
Oliver Stannard
1e49670b31
[DAGISel] Keep flags when converting FP load/store to integer (#111679)
This DAG combine replaces a floating-point load/store pair which has no
other uses with an integer one, but did not copy the memory operand
flags to the new instructions, resulting in it dropping the volatile
flag. This optimisation is still valid if one or both of the
instructions is volatile, so we can copy over the whole
MachineMemOperand to generate volatile integer loads and stores where
needed.
2024-10-10 09:17:50 +01:00
Simon Pilgrim
1dcb6dc757 [DAG] foldVSelectToSignBitSplatMask - pull out repeated code and use getShiftAmountConstant helper.
We're assuming shift amount type matches the result type - which is true for vectors, but I'm hoping to generalize this fold in the future.
2024-10-08 17:36:34 +01:00
Simon Pilgrim
520562c597 Revert 412d59f0a510a05c08ed45545943dfd2f901bc5d "[DAG] combineShiftToMULH - handle zext nneg as sext"
Reverting until I can investigate a miscompilation reported by @mstorsjo
2024-10-01 10:52:27 +01:00
Simon Pilgrim
412d59f0a5 [DAG] combineShiftToMULH - handle zext nneg as sext
Fixes poor codegen on AVX512 targets for a test case from #109790
2024-09-30 12:12:32 +01:00
Pawan Nirpal
26f272ebbd
[X86][SelectionDAG] - Add support for llvm.canonicalize intrinsic (#106370)
Enable support for fcanonicalize intrinsic lowering.
2024-09-23 12:15:38 +01:00
Pierre van Houtryve
758444ca3e
[AMDGPU] Promote uniform ops to I32 in DAGISel (#106383)
Promote uniform binops, selects and setcc between 2 and 16 bits to 32
bits in DAGISel

Solves #64591
2024-09-19 09:00:21 +02:00
David Green
2242cd2b6a
[DAG] Fold vecreduce.or(sext(x)) to sext(vecreduce.or(x)) (#108959)
The same is true for and / xor reductions, where the sext / zext can be
sank down through the bitwise operation.
https://alive2.llvm.org/ce/z/TvzCd5
2024-09-17 15:24:00 +01:00
Matt Arsenault
c49a1ae6d6 DAG: Reorder isFMAFasterThanFMulAndFAdd checks (NFC)
Basic legality checks should be first.
2024-09-15 16:33:01 +04:00
Robert Dazi
8837898b8d
[DAGCombine] Count leading ones: refine post DAG/Type Legalisation if promotion (#102877)
This PR is related to #99591. In this PR, instead of modifying how the
legalisation occurs depending on surrounding instructions, we refine
after legalisation.

This PR has two parts:

* `SDPatternMatch/MatchContext`: Modify a little bit the code to match
Operands (used by `m_Node(...)`) and Unary/Binary/Ternary Patterns to
make it compatible with `VPMatchContext`, instead of only `m_Opc`
supported. Some tests were added to ensure no regressions.
* `DAGCombiner`: Add a `foldSubCtlzNot` which detect and rewrite the
patterns using matching context.

Remaining Tasks:

- [ ] GlobalISel
- [ ] Currently the pattern matching will occur even before
legalisation. Should I restrict it to specific stages instead ?
- [ ] Style: Add a visitVP_SUB ?? Move `foldSubCtlzNot` in another
location for style consistency purpose ?

@topperc

---------

Co-authored-by: v01dxyz <v01dxyz@v01d.xyz>
2024-09-15 15:48:36 +04:00
Simon Pilgrim
5910e8d607 [DAG] visitUDIV - call SimplifyDemandedBits to handle hidden constant foldable cases
Fixes #108728
2024-09-15 12:29:28 +01:00
Simon Pilgrim
69a21154ca
[DAG] Fold trunc(srl(extract_elt(vec,c1),c2)) -> extract_elt(bitcast(vec),c3) (#107987)
Extends existing trunc(extract_elt(vec,c1)) -> extract_elt(bitcast(vec),c3) fold.

Noticed while working on #107404
2024-09-13 15:13:58 +01:00
Simon Pilgrim
6ec889e53f [DAG] Add support for neg(abd(x,y)) patterns.
Currently limited to cases which have legal/custom ABDS/ABDU handling - I'll extend this for all targets in future (similar to how we support neg(abs(x))) once I've addressed some outstanding regressions on aarch64/riscv.

Helps avoid a lot of extra cmov instructions on x86 in particular, and allows us to more easily improve the codegen in future commits.
2024-09-06 13:16:09 +01:00
Princeton Ferro
8f77d37f25
[DAGCombiner] cache negative result from getMergeStoreCandidates() (#106949)
Cache negative search result from getStoreMergeCandidates() so that
mergeConsecutiveStores() does not iterate quadratically over a
potentially long sequence of unmergeable stores.
2024-09-04 18:18:53 +04:00
Simon Pilgrim
b25b9a7d6c [DAG] visitSELECT - add "select usubo(x, y).overflow, (sub y, x), (usubo x, y) -> abdu(x, y)" fold (and neg equivalent)
Handle cases where CGP has merged the CMP+SUB into a USUBO node - improves a few outstanding niggles from #100810
2024-09-04 11:59:10 +01:00
Simon Pilgrim
4baf29e81e [DAG] Handle cases where a shift amount is larger than the pre-extended value bitwidth
In the (zext (shl (zext x), cst)) -> (shl (zext x), cst) fold, don't use a bitmask / MaskedValueIsZero as we can't guarantee that the shift amount is in bounds.

Fixes #106202
2024-08-27 18:12:24 +01:00
Simon Pilgrim
807557654a [DAG] visitTRUNCATE_USAT_U - use sd_match to match FP_TO_UINT_SAT pattern. NFC. 2024-08-23 16:39:32 +01:00
Sumanth Gundapaneni
e78156a0e2
Scalarize the vector inputs to llvm.lround intrinsic by default. (#101054)
Verifier is updated in a different patch to let the vector types for
llvm.lround and llvm.llround intrinsics.
2024-08-21 12:13:56 -05:00
Björn Pettersson
278fc8efdf
[DAGCombiner] Fix ReplaceAllUsesOfValueWith mutation bug in visitFREEZE (#104924)
In visitFREEZE we have been collecting a set/vector of
MaybePoisonOperands that later was iterated over, applying a freeze to
those operands. However, C-level fuzzy testing has discovered that the
recursiveness of ReplaceAllUsesOfValueWith may cause later operands in
the MaybePoisonOperands vector to be replaced when replacing an earlier
operand. That would then turn up as
   Assertion `N1.getOpcode() != ISD::DELETED_NODE &&
              "Operand is DELETED_NODE!"' failed.
failures when trying to freeze those later operands.

So we need to make sure that the vector with MaybePoisonOperands is
mutated as well when needed. Or as the solution used in this patch, make
sure to keep track of operand numbers that should be frozen instead of
having a vector of SDValues. And then we can refetch the operands while
iterating over operand numbers.

The problem was seen after adding SELECT_CC to the set of operations
including in "AllowMultipleMaybePoisonOperands". I'm not sure, but I
guess that this could happen for other operations as well for which we
allow multiple maybe poison operands.
2024-08-21 17:56:27 +02:00
Simon Pilgrim
8109e5de57
[DAG] Add select_cc -> abd folds (#102137)
Fixes #100810
2024-08-21 12:07:40 +01:00
Tianqing Wang
7f87b5bf0e
[SelectionDAG][X86] Preserve unpredictable metadata for conditional branches in SelectionDAG, as well as JCCs generated by X86 backend. (#102101)
This builds on 09515f2c2, which preserves unpredictable metadata in
CodeGen for `select`. This patch does it for conditional branches.
2024-08-19 11:04:48 +08:00
Craig Topper
067f2e9f18 [SelectionDAG] Use getSignedConstant/getAllOnesConstant. 2024-08-17 00:04:01 -07:00
Craig Topper
321de07b77
[DAGCombiner] Remove TRUNCATE_(S/U)SAT_(S/U) from an assert that isn't tested. NFC (#104466) 2024-08-16 08:42:55 -07:00
Craig Topper
e027e04f01
[DAGCombiner] Don't let scalarizeBinOpOfSplats create illegal scalar MULHS/MULHU (#104518)
Type legalization lacks generic support for these operations. They are
normally only created when the type is legal. This scalarization case is
new.

We could update type legalization, but there some corner cases that make
it not straightforward. For example, if the promoted type isn't 2x the
narrow type we need to over promote.

Fixes #104480
2024-08-15 21:07:22 -07:00
YunQiang Su
fb9e685fc4
Intrinsic: introduce minimumnum and maximumnum for IR and SelectionDAG (#96649)
C23 introduced new functions fminimum_num and fmaximum_num, and they
follow the minimumNumber and maximumNumber of IEEE754-2019. Let's
introduce new intrinsics to support them.

This patch introduces support only support for scalar values. The
support of
  vector (vp, vp.reduce, vector.reduce),
  experimental.constrained
will be added in future patches.

With this patch, MIPSr6 and LoongArch can work out of box with
fcanonical and fmax/fmin.

Aarch64/PowerPC64 can use the same login as MIPSr6 and LoongArch, while
they have no fcanonical support yet.
I will add it in future patches.

The FMIN/FMAX of RISC-V instructions follows the
minimumNumber/maximumNumber of IEEE754-2019. We can just add it in
future patch.

Background

https://discourse.llvm.org/t/rfc-fix-llvm-min-f-and-llvm-max-f-intrinsics/79735
Currently we have fminnum/fmaxnum, which have different behavior on
different platform for NUM vs sNaN:
   1) Fallback to fmin(3)/fmax(3): return qNaN.
   2) ARM64/ARM32+Neon: same as libc.
   3) MIPSr6/LoongArch/RISC-V: return NUM.

And the fix of fminnum/fmaxnum to follow minNUM/maxNUM of IEEE754-2008
will submit as separated patches.
2024-08-15 14:09:36 +08:00
Froster
234cb4c6e3
[SelectionDAG] Scalarize binary ops of splats before legal types (#100749)
Fixes #65072. This allows binary ops of splats to be scalarized if the
operation isn't legal on the element type isn't legal, but is legal on
the type it will be legalized to. I assume if an Op is legal both in
scalar and vector, choose scalar version should always be better no
matter what the type is.

There are some cases that my approach can't scalarize, for example:
``` llvm
; test/CodeGen/RISCV/rvv/select-int.ll
define <vscale x 4 x i64> @select_nxv4i64(i1 zeroext %c, <vscale x 4 x i64> %a, <vscale x 4 x i64> %b) {
  %v = select i1 %c, <vscale x 4 x i64> %a, <vscale x 4 x i64> %b
  ret <vscale x 4 x i64> %v
}
```
https://godbolt.org/z/xzqrKrxvK
`xor (splat i1, splat i1)` is generated in late step after LegalizeType,
from select. I didn't figure out how to make `xor i1, i1` legal at this
time.

---------

Co-authored-by: Luke Lau <luke@igalia.com>
2024-08-15 00:07:00 +08:00
Kazu Hirata
5ce326ccb1
[SelectionDAG] Construct SmallVector with ArrayRef (NFC) (#103705) 2024-08-14 08:22:20 -07:00