3937 Commits

Author SHA1 Message Date
Vitaly Buka
d57892a2a1
Revert "[DAGCombiner] Add support for scalarising extracts of a vector setcc" (#118693)
Reverts llvm/llvm-project#117566

Breaks libc++ tests with HWASAN
https://lab.llvm.org/buildbot/#/builders/55/builds/3959
2024-12-04 12:36:46 -08:00
Oliver Stannard
99b862efba
[DAGISel][ARM] Fix vector truncate combine for big-endian (#118101)
This DAG combine was incorrect for big-endian targets, because it
assumes that when a bitcast changes the lane width, the
least-significant bits of the wider lanes are in the lower-numbered
lanes of the smaller type, which is only true for little-endian.
2024-12-04 14:32:15 +00:00
David Sherwood
4675db5f39
[DAGCombiner] Add support for scalarising extracts of a vector setcc (#117566)
For IR like this:

%icmp = icmp ult <4 x i32> %a, splat (i32 5)
%res = extractelement <4 x i1> %icmp, i32 1

where there is only one use of %icmp we can take a similar approach
to what we already do for binary ops such add, sub, etc. and convert
this into

%ext = extractelement <4 x i32> %a, i32 1
%res = icmp ult i32 %ext, 5

For AArch64 targets at least the scalar boolean result will almost
certainly need to be in a GPR anyway, since it will probably be
used by branches for control flow. I've tried to reuse existing code
in scalarizeExtractedBinop to also work for setcc.

NOTE: The optimisations don't apply for tests such as
extract_icmp_v4i32_splat_rhs in the file

CodeGen/AArch64/extract-vector-cmp.ll

because scalarizeExtractedBinOp only works if one of the input
operands is a constant.
2024-12-04 10:26:51 +00:00
fengfeng
7907292daa
[DAG] Apply Disjoint flag. (#118045)
or disjoint (or disjoint (x, c0), c1)
-->
or disjont x, or (c0, c1)
Alive2: https://alive2.llvm.org/ce/z/3wPth5

---------

Signed-off-by: feng.feng <feng.feng@iluvatar.com>
2024-12-03 09:21:03 +08:00
Simon Pilgrim
31b7d4333a
[DAG] Extend extract_element(bitcast(scalar_to_vector(X))) -> trunc(srl(X,C)) (#117900)
When extracting a smaller integer from a scalar_to_vector source, we were limited to only folding/truncating the lowest bits of the scalar source.

This patch extends the fold to handle extraction of any other element, by right shifting the source before truncation.

Fixes a regression from #117884
2024-11-29 17:24:38 +00:00
David Sherwood
9b76e7fc60
Revert "[DAGCombiner] Add support for scalarising extracts of a vector setcc (#116031)" (#117556)
This reverts commit 22ec44f509ff266b581dbb490d7b040473b7c31a.
2024-11-25 13:49:21 +00:00
David Sherwood
22ec44f509
[DAGCombiner] Add support for scalarising extracts of a vector setcc (#116031)
For IR like this:

  %icmp = icmp ult <4 x i32> %a, splat (i32 5)
  %res = extractelement <4 x i1> %icmp, i32 1

where there is only one use of %icmp we can take a similar approach
to what we already do for binary ops such add, sub, etc. and convert
this into

  %ext = extractelement <4 x i32> %a, i32 1
  %res = icmp ult i32 %ext, 5

For AArch64 targets at least the scalar boolean result will almost
certainly need to be in a GPR anyway, since it will probably be
used by branches for control flow. I've tried to reuse existing code
in scalarizeExtractedBinop to also work for setcc.

NOTE: The optimisations don't apply for tests such as
extract_icmp_v4i32_splat_rhs in the file

CodeGen/AArch64/extract-vector-cmp.ll

because scalarizeExtractedBinOp only works if one of the input
operands is a constant.
2024-11-25 09:25:01 +00:00
Jonathan Cohen
00d383ee9d
[DAGCombiner] Limit steps in shouldCombineToPostInc (#116030)
Currently the function will walk the entire DAG to find other candidates
to perform a post-inc store. This leads to very long compilation times
on large functions. Added a MaxSteps limit to avoid this, which is also
aligned to how hasPredecessorHelper is used elsewhere in the code.
2024-11-21 11:58:37 +02:00
David Sherwood
b9dd60228c
[DAGCombiner] Remove a hasOneUse check in visitAND (#115142)
For some reason there was a hasOneUse check on the splat for the
second operand and it's not obvious to me why. The check blocks
optimisations for lowering of nodes like AVGFLOORU and AVGCEILU.

In a follow-on patch I also plan to improve the generated code
for AVGCEILU further by teaching computeKnownBits about
zero-extending masked loads.
2024-11-08 08:20:31 +00:00
Yingwei Zheng
f74aed7938
[DAGCombiner] Add basic support for trunc nsw/nuw (#113808)
This patch adds basic support for `trunc nsw/nuw` in SDAG. It will allow
DAGCombiner to further eliminate in-reg `zext/sext` instructions.
2024-11-07 00:23:53 +08:00
Simon Pilgrim
aef0e77c76
[DAG] visitAND - Fold (and (srl X, C), 1) -> (srl X, BW-1) for signbit extraction (#114992)
If we're masking the LSB of a SRL node result and that is shifting down an extended sign bit, see if we can change the SRL to shift down the MSB directly.

These patterns can occur during legalisation when we've sign extended to a wider type but the SRL is still shifting from the subreg.

Alternative to #114967

Fixes the remaining regression in #112588
2024-11-05 14:42:15 +00:00
Simon Pilgrim
5df387227e [DAG] visitAND - refactor "and (sub 0, ext(bool X)), 1 --> zext(bool X)" to use SDPatternMatch. 2024-11-05 14:30:21 +00:00
Kazu Hirata
6927a434ba
[SelectionDAG] Remove unused includes (NFC) (#114697)
Identified with misc-include-cleaner.
2024-11-03 07:03:33 -08:00
dnsampaio
28d0718033
[DAGCombiner] Add combine avg from shifts (#113909)
This teaches dagcombiner to fold:
`(asr (add nsw x, y), 1) -> (avgfloors x, y)`
`(lsr (add nuw x, y), 1) -> (avgflooru x, y)`

as well the combine them to a ceil variant:
`(avgfloors (add nsw x, y), 1) -> (avgceils x, y)` 
`(avgflooru (add nuw x, y), 1) -> (avgceilu x, y)`

iff valid for the target.

Removes some of the ARM MVE patterns that are now dead code.
It adds the avg opcodes to `IsQRMVEInstruction` as to preserve the
immediate splatting as before.
2024-10-31 10:57:27 +01:00
Yingwei Zheng
cf9d1c1486
[SDAG] Simplify SDNodeFlags with bitwise logic (#114061)
This patch allows using enumeration values directly and simplifies the
implementation with bitwise logic. It addresses the comment in
https://github.com/llvm/llvm-project/pull/113808#discussion_r1819923625.
2024-10-31 08:10:07 +08:00
Simon Pilgrim
f7b5f0c805 [DAG] Fold (and X, (rot (not Y), Z)) -> (and X, (not (rot Y, Z)))
On ANDNOT capable targets we can always do this profitably, without ANDNOT we only attempt this if we don't introduce an additional NOT

Followup to #112547
2024-10-30 10:46:12 +00:00
Simon Pilgrim
056cf936a7
[DAG] Fold (and X, (bswap/bitreverse (not Y))) -> (and X, (not (bswap/bitreverse Y))) (#112547)
On ANDNOT capable targets we can always do this profitably, without ANDNOT we only attempt this if we don't introduce an additional NOT

Fixes #112425
2024-10-28 11:52:44 +00:00
James Chesterman
11c818816d
[AArch64] Improve index selection for histograms (#111150)
Removes unnecessary extends on the indices passed into histogram instructions. It also removes the instruction when the mask is zero.
2024-10-22 11:14:00 +01:00
Simon Pilgrim
f0b3b6d15b [DAG] isConstantIntBuildVectorOrConstantInt - peek through bitcasts (#112710) (REAPPLIED)
Alter both isConstantIntBuildVectorOrConstantInt + isConstantFPBuildVectorOrConstantFP to return a bool instead of the underlying SDNode, and adjust usage to account for this.

Update isConstantIntBuildVectorOrConstantInt to peek though bitcasts when attempting to find a constant, in particular this improves canonicalization of constants to the RHS on commutable instructions.

X86 is the beneficiary here as it often bitcasts rematerializable 0/-1 vector constants as vXi32 and bitcasts to the requested type

Minor cleanup that helps with #107423

Reapplied after regression fix ba1255def64a9c3c68d97ace051eec76f546eeb0
2024-10-20 14:23:21 +01:00
Simon Pilgrim
ba1255def6 [DAG] Use FoldConstantArithmetic to constant fold (and (ext (and V, c1)), c2) -> (and (ext V), (and c1, (ext c2)))
Noticed while triaging the regression from #112710 noticed by @mstorsjo - don't rely on isConstantIntBuildVectorOrConstantInt+getNode to guarantee constant folding (if it fails to constant fold it will infinite loop), use FoldConstantArithmetic instead.
2024-10-20 13:05:23 +01:00
Martin Storsjö
b26df3e463 Revert "[DAG] isConstantIntBuildVectorOrConstantInt - peek through bitcasts (#112710)"
This reverts commit a630771b28f4b252e2754776b8f3ab416133951a.

This caused compilation to hang for Windows/ARM, see
https://github.com/llvm/llvm-project/pull/112710 for details.
2024-10-20 00:49:16 +03:00
Simon Pilgrim
93ec08d629 [DAG] Move SIGN_EXTEND_INREG constant folding inside FoldConstantArithmetic
Update visitSIGN_EXTEND_INREG to call FoldConstantArithmetic instead of getNode.
2024-10-19 20:57:07 +01:00
Simon Pilgrim
e1330d96a0 [DAG] visitFMA/FDIV - avoid SDLoc duplication. NFC. 2024-10-18 11:57:41 +01:00
Simon Pilgrim
5c37316b54 [DAG] visitFMA/FMAD - use FoldConstantArithmetic to add missing vector constant folding support 2024-10-18 11:12:06 +01:00
Simon Pilgrim
a630771b28
[DAG] isConstantIntBuildVectorOrConstantInt - peek through bitcasts (#112710)
Alter both isConstantIntBuildVectorOrConstantInt + isConstantFPBuildVectorOrConstantFP to return a bool instead of the underlying SDNode, and adjust usage to account for this.

Update isConstantIntBuildVectorOrConstantInt to peek though bitcasts when attempting to find a constant, in particular this improves canonicalization of constants to the RHS on commutable instructions.

X86 is the beneficiary here as it often bitcasts rematerializable 0/-1 vector constants as vXi32 and bitcasts to the requested type

Minor cleanup that helps with #107423
2024-10-18 10:52:55 +01:00
Simon Pilgrim
3ec1b1a4dd [DAG] visitFP_EXTEND - use FoldConstantArithmetic to attempt to constant fold
Don't rely on isConstantFPBuildVectorOrConstantFP followed by getNode() will constant fold - FoldConstantArithmetic will do all of this for us.
2024-10-18 10:10:44 +01:00
Simon Pilgrim
3a1df05ca9 [DAG] visitFP_ROUND - use FoldConstantArithmetic to attempt to constant fold
Don't rely on isConstantFPBuildVectorOrConstantFP followed by getNode() will constant fold - FoldConstantArithmetic will do all of this for us.
2024-10-18 10:10:43 +01:00
Simon Pilgrim
7a43be1690 [DAG] visitXROUND - use FoldConstantArithmetic to attempt to constant fold
Don't rely on isConstantFPBuildVectorOrConstantFP followed by getNode() will constant fold - FoldConstantArithmetic will do all of this for us.
2024-10-18 10:10:43 +01:00
Simon Pilgrim
c72992bf89 [DAG] visitABS - use FoldConstantArithmetic to attempt to constant fold
Don't rely on isConstantFPBuildVectorOrConstantFP followed by getNode() will constant fold - FoldConstantArithmetic will do all of this for us.

Cleanup for #112682
2024-10-18 10:10:43 +01:00
Simon Pilgrim
256bbdb3f6 [DAG] visitFCEIL/FTRUNC/FFLOOR/FNEG - use FoldConstantArithmetic to attempt to constant fold
Don't rely on isConstantFPBuildVectorOrConstantFP followed by getNode() will constant fold - FoldConstantArithmetic will do all of this for us.

Cleanup for #112682
2024-10-17 16:53:44 +01:00
Simon Pilgrim
cf046c8717 [DAG] visitSIGN_EXTEND_INREG - avoid SDLoc duplication. NFC. 2024-10-17 12:51:11 +01:00
Simon Pilgrim
5692a0c6f8 [DAG] visitFP_TO_SINT/FP_TO_UINT - use FoldConstantArithmetic to attempt to constant fold
Don't rely on isConstantFPBuildVectorOrConstantFP followed by getNode() will constant fold - FoldConstantArithmetic will do all of this for us.

Cleanup for #112682
2024-10-17 12:50:09 +01:00
Simon Pilgrim
784c15a282 [DAG] visitSINT_TO_FP/UINT_TO_FP - use FoldConstantArithmetic to attempt to constant fold
Don't rely on isConstantIntBuildVectorOrConstantInt followed by getNode() will constant fold - FoldConstantArithmetic will do all of this for us.

Cleanup for #112682
2024-10-17 12:50:09 +01:00
Simon Pilgrim
8268bc48eb [DAG] Avoid SDLoc duplication in FP<->INT combines. NFC. 2024-10-17 12:50:09 +01:00
Lewis Crawford
f5f00764ab
[DAGCombiner] Fix check for extending loads (#112182)
Fix a check for extending loads in DAGCombiner,
where if the result type has more bits than the
loaded type it should count as an extending load.

All backends apart from AArch64 ignore this
ExtTy argument to shouldReduceLoadWidth, so this
change currently only impacts AArch64.
2024-10-16 13:23:46 +01:00
Simon Pilgrim
25b702f263 [DAG] visitXOR - add missing comment for or/and constant setcc demorgan fold. NFC.
Noticed while triaging #112347 which is using this fold - we described the or->and fold, but not the equivalent and->or which is also handled.
2024-10-16 11:15:36 +01:00
Simon Pilgrim
30deb76d46 [DAG] visitXOR - add missing comment for or/and constant demorgan fold. NFC.
Noticed while triaging #112347 which is using this fold - we described the or->and fold, but not the equivalent and->or which is also handled.
2024-10-15 16:32:27 +01:00
c8ef
854ded9b24
Reapply "[DAG] Enhance SDPatternMatch to match integer minimum and maximum patterns in addition to the existing ISD nodes." (#112203)
This patch adds icmp+select patterns for integer min/max matchers in
SDPatternMatch, similar to those in IR PatternMatch.

Reapply #111774.

Closes #108218.
2024-10-15 21:07:06 +08:00
c8ef
a3b0c31ebc
Revert "[DAG] Enhance SDPatternMatch to match integer minimum and maximum patterns in addition to the existing ISD nodes." (#112200)
Reverts llvm/llvm-project#111774

This appears to be causing some tests to fail.
2024-10-14 21:43:49 +08:00
c8ef
11f625cb87
[DAG] Enhance SDPatternMatch to match integer minimum and maximum patterns in addition to the existing ISD nodes. (#111774)
Closes #108218.

This patch adds icmp+select patterns for integer min/max matchers in
SDPatternMatch, similar to those in IR PatternMatch.
2024-10-14 21:19:34 +08:00
Oliver Stannard
1e49670b31
[DAGISel] Keep flags when converting FP load/store to integer (#111679)
This DAG combine replaces a floating-point load/store pair which has no
other uses with an integer one, but did not copy the memory operand
flags to the new instructions, resulting in it dropping the volatile
flag. This optimisation is still valid if one or both of the
instructions is volatile, so we can copy over the whole
MachineMemOperand to generate volatile integer loads and stores where
needed.
2024-10-10 09:17:50 +01:00
Simon Pilgrim
1dcb6dc757 [DAG] foldVSelectToSignBitSplatMask - pull out repeated code and use getShiftAmountConstant helper.
We're assuming shift amount type matches the result type - which is true for vectors, but I'm hoping to generalize this fold in the future.
2024-10-08 17:36:34 +01:00
Simon Pilgrim
520562c597 Revert 412d59f0a510a05c08ed45545943dfd2f901bc5d "[DAG] combineShiftToMULH - handle zext nneg as sext"
Reverting until I can investigate a miscompilation reported by @mstorsjo
2024-10-01 10:52:27 +01:00
Simon Pilgrim
412d59f0a5 [DAG] combineShiftToMULH - handle zext nneg as sext
Fixes poor codegen on AVX512 targets for a test case from #109790
2024-09-30 12:12:32 +01:00
Pawan Nirpal
26f272ebbd
[X86][SelectionDAG] - Add support for llvm.canonicalize intrinsic (#106370)
Enable support for fcanonicalize intrinsic lowering.
2024-09-23 12:15:38 +01:00
Pierre van Houtryve
758444ca3e
[AMDGPU] Promote uniform ops to I32 in DAGISel (#106383)
Promote uniform binops, selects and setcc between 2 and 16 bits to 32
bits in DAGISel

Solves #64591
2024-09-19 09:00:21 +02:00
David Green
2242cd2b6a
[DAG] Fold vecreduce.or(sext(x)) to sext(vecreduce.or(x)) (#108959)
The same is true for and / xor reductions, where the sext / zext can be
sank down through the bitwise operation.
https://alive2.llvm.org/ce/z/TvzCd5
2024-09-17 15:24:00 +01:00
Matt Arsenault
c49a1ae6d6 DAG: Reorder isFMAFasterThanFMulAndFAdd checks (NFC)
Basic legality checks should be first.
2024-09-15 16:33:01 +04:00
Robert Dazi
8837898b8d
[DAGCombine] Count leading ones: refine post DAG/Type Legalisation if promotion (#102877)
This PR is related to #99591. In this PR, instead of modifying how the
legalisation occurs depending on surrounding instructions, we refine
after legalisation.

This PR has two parts:

* `SDPatternMatch/MatchContext`: Modify a little bit the code to match
Operands (used by `m_Node(...)`) and Unary/Binary/Ternary Patterns to
make it compatible with `VPMatchContext`, instead of only `m_Opc`
supported. Some tests were added to ensure no regressions.
* `DAGCombiner`: Add a `foldSubCtlzNot` which detect and rewrite the
patterns using matching context.

Remaining Tasks:

- [ ] GlobalISel
- [ ] Currently the pattern matching will occur even before
legalisation. Should I restrict it to specific stages instead ?
- [ ] Style: Add a visitVP_SUB ?? Move `foldSubCtlzNot` in another
location for style consistency purpose ?

@topperc

---------

Co-authored-by: v01dxyz <v01dxyz@v01d.xyz>
2024-09-15 15:48:36 +04:00
Simon Pilgrim
5910e8d607 [DAG] visitUDIV - call SimplifyDemandedBits to handle hidden constant foldable cases
Fixes #108728
2024-09-15 12:29:28 +01:00