llvm-project

Author	SHA1	Message	Date
David Green	2242cd2b6a	[DAG] Fold vecreduce.or(sext(x)) to sext(vecreduce.or(x)) (#108959 ) The same is true for and / xor reductions, where the sext / zext can be sank down through the bitwise operation. https://alive2.llvm.org/ce/z/TvzCd5	2024-09-17 15:24:00 +01:00
Matt Arsenault	c49a1ae6d6	DAG: Reorder isFMAFasterThanFMulAndFAdd checks (NFC) Basic legality checks should be first.	2024-09-15 16:33:01 +04:00
Robert Dazi	8837898b8d	[DAGCombine] Count leading ones: refine post DAG/Type Legalisation if promotion (#102877 ) This PR is related to #99591. In this PR, instead of modifying how the legalisation occurs depending on surrounding instructions, we refine after legalisation. This PR has two parts: * `SDPatternMatch/MatchContext`: Modify a little bit the code to match Operands (used by `m_Node(...)`) and Unary/Binary/Ternary Patterns to make it compatible with `VPMatchContext`, instead of only `m_Opc` supported. Some tests were added to ensure no regressions. * `DAGCombiner`: Add a `foldSubCtlzNot` which detect and rewrite the patterns using matching context. Remaining Tasks: - [ ] GlobalISel - [ ] Currently the pattern matching will occur even before legalisation. Should I restrict it to specific stages instead ? - [ ] Style: Add a visitVP_SUB ?? Move `foldSubCtlzNot` in another location for style consistency purpose ? @topperc --------- Co-authored-by: v01dxyz <v01dxyz@v01d.xyz>	2024-09-15 15:48:36 +04:00
Simon Pilgrim	5910e8d607	[DAG] visitUDIV - call SimplifyDemandedBits to handle hidden constant foldable cases Fixes #108728	2024-09-15 12:29:28 +01:00
Simon Pilgrim	69a21154ca	[DAG] Fold trunc(srl(extract_elt(vec,c1),c2)) -> extract_elt(bitcast(vec),c3) (#107987 ) Extends existing trunc(extract_elt(vec,c1)) -> extract_elt(bitcast(vec),c3) fold. Noticed while working on #107404	2024-09-13 15:13:58 +01:00
Simon Pilgrim	6ec889e53f	[DAG] Add support for neg(abd(x,y)) patterns. Currently limited to cases which have legal/custom ABDS/ABDU handling - I'll extend this for all targets in future (similar to how we support neg(abs(x))) once I've addressed some outstanding regressions on aarch64/riscv. Helps avoid a lot of extra cmov instructions on x86 in particular, and allows us to more easily improve the codegen in future commits.	2024-09-06 13:16:09 +01:00
Princeton Ferro	8f77d37f25	[DAGCombiner] cache negative result from getMergeStoreCandidates() (#106949 ) Cache negative search result from getStoreMergeCandidates() so that mergeConsecutiveStores() does not iterate quadratically over a potentially long sequence of unmergeable stores.	2024-09-04 18:18:53 +04:00
Simon Pilgrim	b25b9a7d6c	[DAG] visitSELECT - add "select usubo(x, y).overflow, (sub y, x), (usubo x, y) -> abdu(x, y)" fold (and neg equivalent) Handle cases where CGP has merged the CMP+SUB into a USUBO node - improves a few outstanding niggles from #100810	2024-09-04 11:59:10 +01:00
Simon Pilgrim	4baf29e81e	[DAG] Handle cases where a shift amount is larger than the pre-extended value bitwidth In the (zext (shl (zext x), cst)) -> (shl (zext x), cst) fold, don't use a bitmask / MaskedValueIsZero as we can't guarantee that the shift amount is in bounds. Fixes #106202	2024-08-27 18:12:24 +01:00
Simon Pilgrim	807557654a	[DAG] visitTRUNCATE_USAT_U - use sd_match to match FP_TO_UINT_SAT pattern. NFC.	2024-08-23 16:39:32 +01:00
Sumanth Gundapaneni	e78156a0e2	Scalarize the vector inputs to llvm.lround intrinsic by default. (#101054 ) Verifier is updated in a different patch to let the vector types for llvm.lround and llvm.llround intrinsics.	2024-08-21 12:13:56 -05:00
Björn Pettersson	278fc8efdf	[DAGCombiner] Fix ReplaceAllUsesOfValueWith mutation bug in visitFREEZE (#104924 ) In visitFREEZE we have been collecting a set/vector of MaybePoisonOperands that later was iterated over, applying a freeze to those operands. However, C-level fuzzy testing has discovered that the recursiveness of ReplaceAllUsesOfValueWith may cause later operands in the MaybePoisonOperands vector to be replaced when replacing an earlier operand. That would then turn up as Assertion `N1.getOpcode() != ISD::DELETED_NODE && "Operand is DELETED_NODE!"' failed. failures when trying to freeze those later operands. So we need to make sure that the vector with MaybePoisonOperands is mutated as well when needed. Or as the solution used in this patch, make sure to keep track of operand numbers that should be frozen instead of having a vector of SDValues. And then we can refetch the operands while iterating over operand numbers. The problem was seen after adding SELECT_CC to the set of operations including in "AllowMultipleMaybePoisonOperands". I'm not sure, but I guess that this could happen for other operations as well for which we allow multiple maybe poison operands.	2024-08-21 17:56:27 +02:00
Simon Pilgrim	8109e5de57	[DAG] Add select_cc -> abd folds (#102137 ) Fixes #100810	2024-08-21 12:07:40 +01:00
Tianqing Wang	7f87b5bf0e	[SelectionDAG][X86] Preserve unpredictable metadata for conditional branches in SelectionDAG, as well as JCCs generated by X86 backend. (#102101 ) This builds on 09515f2c2, which preserves unpredictable metadata in CodeGen for `select`. This patch does it for conditional branches.	2024-08-19 11:04:48 +08:00
Craig Topper	067f2e9f18	[SelectionDAG] Use getSignedConstant/getAllOnesConstant.	2024-08-17 00:04:01 -07:00
Craig Topper	321de07b77	[DAGCombiner] Remove TRUNCATE_(S/U)SAT_(S/U) from an assert that isn't tested. NFC (#104466 )	2024-08-16 08:42:55 -07:00
Craig Topper	e027e04f01	[DAGCombiner] Don't let scalarizeBinOpOfSplats create illegal scalar MULHS/MULHU (#104518 ) Type legalization lacks generic support for these operations. They are normally only created when the type is legal. This scalarization case is new. We could update type legalization, but there some corner cases that make it not straightforward. For example, if the promoted type isn't 2x the narrow type we need to over promote. Fixes #104480	2024-08-15 21:07:22 -07:00
YunQiang Su	fb9e685fc4	Intrinsic: introduce minimumnum and maximumnum for IR and SelectionDAG (#96649 ) C23 introduced new functions fminimum_num and fmaximum_num, and they follow the minimumNumber and maximumNumber of IEEE754-2019. Let's introduce new intrinsics to support them. This patch introduces support only support for scalar values. The support of vector (vp, vp.reduce, vector.reduce), experimental.constrained will be added in future patches. With this patch, MIPSr6 and LoongArch can work out of box with fcanonical and fmax/fmin. Aarch64/PowerPC64 can use the same login as MIPSr6 and LoongArch, while they have no fcanonical support yet. I will add it in future patches. The FMIN/FMAX of RISC-V instructions follows the minimumNumber/maximumNumber of IEEE754-2019. We can just add it in future patch. Background https://discourse.llvm.org/t/rfc-fix-llvm-min-f-and-llvm-max-f-intrinsics/79735 Currently we have fminnum/fmaxnum, which have different behavior on different platform for NUM vs sNaN: 1) Fallback to fmin(3)/fmax(3): return qNaN. 2) ARM64/ARM32+Neon: same as libc. 3) MIPSr6/LoongArch/RISC-V: return NUM. And the fix of fminnum/fmaxnum to follow minNUM/maxNUM of IEEE754-2008 will submit as separated patches.	2024-08-15 14:09:36 +08:00
Froster	234cb4c6e3	[SelectionDAG] Scalarize binary ops of splats before legal types (#100749 ) Fixes #65072. This allows binary ops of splats to be scalarized if the operation isn't legal on the element type isn't legal, but is legal on the type it will be legalized to. I assume if an Op is legal both in scalar and vector, choose scalar version should always be better no matter what the type is. There are some cases that my approach can't scalarize, for example: ``` llvm ; test/CodeGen/RISCV/rvv/select-int.ll define <vscale x 4 x i64> @select_nxv4i64(i1 zeroext %c, <vscale x 4 x i64> %a, <vscale x 4 x i64> %b) { %v = select i1 %c, <vscale x 4 x i64> %a, <vscale x 4 x i64> %b ret <vscale x 4 x i64> %v } ``` https://godbolt.org/z/xzqrKrxvK `xor (splat i1, splat i1)` is generated in late step after LegalizeType, from select. I didn't figure out how to make `xor i1, i1` legal at this time. --------- Co-authored-by: Luke Lau <luke@igalia.com>	2024-08-15 00:07:00 +08:00
Kazu Hirata	5ce326ccb1	[SelectionDAG] Construct SmallVector with ArrayRef (NFC) (#103705 )	2024-08-14 08:22:20 -07:00
hanbeom	0d074ba197	[DAG] Support saturated truncate (#99418 ) A truncate is considered saturated if no additional conversion is required between the target and return values. If the target is saturated when attempting to truncate from a vector, there is an opportunity to optimize it. Previously, each architecture had its own attempt at optimization, leading to redundant code. This patch implements common logic by introducing three new ISDs: `ISD::TRUNCATE_SSAT_S`: When the operand is a signed value and the range of values matches the range of signed values of the destination type. `ISD::TRUNCATE_SSAT_U`: When the operand is a signed value and the range of values matches the range of unsigned values of the destination type. `ISD::TRUNCATE_USAT_U`: When the operand is an unsigned value and the range of values matches the range of unsigned values of the destination type. These ISDs indicate a saturated truncate. Fixes https://github.com/llvm/llvm-project/issues/85903	2024-08-14 09:52:36 +01:00
Craig Topper	51bad732dc	[SelectionDAG] Replace EVTToAPFloatSemantics with MVT/EVT::getFltSemantics. (#103001 )	2024-08-13 11:35:28 -07:00
Pierre van Houtryve	7389545d0d	Reapply "[AMDGPU] Always lower s/udiv64 by constant to MUL" (#101942 ) Reland #100723, fixing the ARM issue at the cost of a small loss of optimization in `test/CodeGen/AMDGPU/fshr.ll` Solves #100383	2024-08-12 09:00:22 +02:00
Kazu Hirata	f4fb735840	[llvm] Construct SmallVector<SDValue> with ArrayRef (NFC) (#102578 )	2024-08-09 09:15:42 -07:00
Simon Pilgrim	ad00e8a8dd	[DAG] Replace m_SpecificInt(1) -> m_One() For SDPatternMatch there's no difference in undef/poison vector element handling - in fact m_One() just wraps m_SpecificInt(1)	2024-08-08 18:20:46 +01:00
Simon Pilgrim	13d04fa560	[DAG] Add legalization handling for ABDS/ABDU (#92576 ) (REAPPLIED) Always match ABD patterns pre-legalization, and use TargetLowering::expandABD to expand again during legalization. abdu(lhs, rhs) -> sub(xor(sub(lhs, rhs), usub_overflow(lhs, rhs)), usub_overflow(lhs, rhs)) Alive2: https://alive2.llvm.org/ce/z/dVdMyv REAPPLIED: Fix regression issue with "abs(ext(x) - ext(y)) -> zext(abd(x, y))" fold failing after type legalization	2024-08-08 11:39:05 +01:00
Simon Pilgrim	e4e96b3e26	Revert b1234ddbe2652aa7948242a57107ca7ab12fd2f8. "[DAG] Add legalization handling for ABDS/ABDU (#92576 )" Reverting #92576 while we identify a reported regression	2024-08-07 17:11:25 +01:00
Simon Pilgrim	6e60d549d4	[DAG] Add foldSelectToABD helper. NFC. Pull out of visitVSELECT to allow reuse in the future.	2024-08-06 13:31:53 +01:00
Simon Pilgrim	b1234ddbe2	[DAG] Add legalization handling for ABDS/ABDU (#92576 ) Always match ABD patterns pre-legalization, and use TargetLowering::expandABD to expand again during legalization. abdu(lhs, rhs) -> sub(xor(sub(lhs, rhs), usub_overflow(lhs, rhs)), usub_overflow(lhs, rhs)) Alive2: https://alive2.llvm.org/ce/z/dVdMyv	2024-08-06 10:18:06 +01:00
Luke Lau	33fc322696	[SelectionDAG] Simplify vselect true, T, F -> T (#100992 ) This addresses a TODO where we can fold a vselect to it's true operand if the boolean is known to be all trues, by factoring out the logic from extractBooleanFlip which checks TLI.getBooleanContents.	2024-08-06 10:49:20 +08:00
Kazu Hirata	8d1b17b662	[CodeGen] Construct SmallVector with ArrayRef (NFC) (#101841 )	2024-08-04 00:41:29 -07:00
Michael Maitland	22ce33304e	Revert "[DAG][NFC] Use SDPatternMatch for VScale in some instances" This reverts commit d2304427cb0270259bc083a3db27413823f56e59. The m_Add and m_Mul are commutative but the code does not expect the communtativity.	2024-07-31 04:55:13 -07:00
Michael Maitland	d2304427cb	[DAG][NFC] Use SDPatternMatch for VScale in some instances	2024-07-29 06:50:27 -07:00
Matt Davis	404071b059	[SelectionDAG] Preserve volatile undef stores. (#99918 ) This patch preserves `undef` SDNodes that are `volatile` qualified. Previously, these nodes would be discarded. The motivation behind this change is to adhere to the [LangRef](https://llvm.org/docs/LangRef.html#volatile-memory-accesses), even though that doc is mostly in terms of LLVM-IR, it seems reasonable to imply that the volatile constraints also imply to SDNodes. > Certain memory accesses, such as [load](https://llvm.org/docs/LangRef.html#i-load)’s, [store](https://llvm.org/docs/LangRef.html#i-store)’s, and [llvm.memcpy](https://llvm.org/docs/LangRef.html#int-memcpy)’s may be marked volatile. The optimizers must not change the number of volatile operations or change their order of execution relative to other volatile operations. The optimizers may change the order of volatile operations relative to non-volatile operations. This is not Java’s “volatile” and has no cross-thread synchronization behavior. Source: https://llvm.org/docs/LangRef.html#volatile-memory-accesses	2024-07-24 08:41:56 -04:00
David Green	b42fe6740e	[DAG] Add users of operand of simplified extract_vector_elt to worklist (#100074 ) This helps to ensure we revisit the last extract_element uses of a node so that it can be optimized away in cases such as extract(insert(scalartovec(x), 1), 0).	2024-07-23 16:34:09 +01:00
Björn Pettersson	2b78303e3f	[DAGCombiner] Freeze maybe poison operands when folding select to logic (#84924 ) Just like for regular IR we need to treat SELECT as conditionally blocking poison in SelectionDAG. So (unless the condition itself is poison) the result is only poison if the selected true/false value is poison. Thus, when doing DAG combines that turn SELECT into arithmetic/logical operations (e.g. AND/OR) we need to make sure that the new operations aren't more poisonous. One way to do that is to use FREEZE to make sure the operands aren't posion. This patch aims at fixing the kind of miscompiles reported in https://github.com/llvm/llvm-project/issues/84653 and https://github.com/llvm/llvm-project/issues/85190 Solution is to make sure that we insert FREEZE, if needed to make the fold sound, when using the foldBoolSelectToLogic and foldVSelectToSignBitSplatMask DAG combines.	2024-07-22 17:19:46 +02:00
Bjorn Pettersson	8ebe7e60f5	[DAGCombiner] Push freeze through SETCC and SELECT_CC (#64718 ) Allow pushing freeze through SETCC and SELECT_CC even if there are multiple "maybe poison" operands. In the past we have limited it to a single "maybe poison" operand, but it seems profitable to also allow the multiple operand scenario. One goal here is to avoid some regressions seen in review of https://github.com/llvm/llvm-project/pull/84924 when solving the select->and miscompiles described in https://github.com/llvm/llvm-project/issues/84653	2024-07-22 16:01:59 +02:00
Simon Pilgrim	f406d83d95	[DAG] widenCtPop - reuse existing SDLoc. NFC.	2024-07-22 11:24:23 +01:00
Joseph Huber	615b7eeaa9	Reapply "[LLVM][LTO] Factor out RTLib calls and allow them to be dropped (#98512 )" This reverts commit 740161a9b98c9920dedf1852b5f1c94d0a683af5. I moved the `ISD` dependencies into the CodeGen portion of the handling, it's a little awkward but it's the easiest solution I can think of for now.	2024-07-20 09:29:31 -05:00
NAKAMURA Takumi	740161a9b9	Revert "[LLVM][LTO] Factor out RTLib calls and allow them to be dropped (#98512 )" This reverts commit c05126bdfc3b02daa37d11056fa43db1a6cdef69. (llvmorg-19-init-17714-gc05126bdfc3b) See #99610	2024-07-20 12:36:57 +09:00
Simon Pilgrim	497ea1d849	[DAG] tryToFoldExtendSelectLoad - reuse existing SDLoc. NFC.	2024-07-18 16:19:15 +01:00
Lawrence Benson	177ce1900f	[LLVM] Add `llvm.experimental.vector.compress` intrinsic (#92289 ) This PR adds a new vector intrinsic `@llvm.experimental.vector.compress` to "compress" data within a vector based on a selection mask, i.e., it moves all selected values (i.e., where `mask[i] == 1`) to consecutive lanes in the result vector. A `passthru` vector can be provided, from which remaining lanes are filled. The main reason for this is that the existing `@llvm.masked.compressstore` has very strong constraints in that it can only write values that were selected, resulting in guard branches for all targets except AVX-512 (and even there the AMD implementation is _very_ slow). More instruction sets support "compress" logic, but only within registers. So to store the values, an additional store is needed. But this combination is likely significantly faster on many target as it avoids branches. In follow up PRs, my plan is to add target-specific lowerings for x86, SVE, and possibly RISCV. I also want to combine this with a store instruction, as this is probably a common case and we can avoid some memory writes in that case. See [discussion in forum](https://discourse.llvm.org/t/new-intrinsic-for-masked-vector-compress-without-store/78663) for initial discussion on the design.	2024-07-17 14:24:24 +02:00
Joseph Huber	c05126bdfc	[LLVM][LTO] Factor out RTLib calls and allow them to be dropped (#98512 ) Summary: The LTO pass and LLD linker have logic in them that forces extraction and prevent internalization of needed runtime calls. However, these currently take all RTLibcalls into account, even if the target does not support them. The target opts-out of a libcall if it sets its name to nullptr. This patch pulls this logic out into a class in the header so that LTO / lld can use it to determine if a symbol actually needs to be kept. This is important for targets like AMDGPU that want to be able to use `lld` to perform the final link step, but does not want the overhead of uncalled functions. (This adds like a second to the link time trivially)	2024-07-16 06:22:09 -05:00
Simon Pilgrim	290537238b	[X86] visitADDLike - pull out repeated SDLoc. NFC.	2024-07-15 17:20:58 +01:00
Simon Pilgrim	3560e1d0ce	[DAG] visitADDLike - convert (A-B)+(C-D) --> (A+C)-(B+D) fold to sd_match. NFC.	2024-07-15 17:20:58 +01:00
Simon Pilgrim	ba8792b667	[X86] visitFCOPYSIGN - pull out repeated SDLoc. NFC.	2024-07-15 16:34:21 +01:00
Simon Pilgrim	61a4e1e70f	[DAG] Add SDPatternMatch::m_SetCC and update some combines to use it (#98646 ) The plan is to add more TernaryOp in the future (SELECT/VSELECT and FMA in particular)	2024-07-14 17:18:43 +01:00
Kazu Hirata	66cd2e0f9a	[CodeGen] Use range-based for loops (NFC) (#98706 )	2024-07-13 13:29:47 -07:00
AtariDreams	4f8b2fff6d	[DAG] Use break instead of continue to leave do while (false) loop (NFC) (#97966 )	2024-07-10 20:51:06 +04:00
Craig Topper	33112cbf59	[DAGCombiner] Remove unnecessary assert from getShiftAmountTy wrapper. NFC The same assert appears in the TargetLowering function. Refine comment to describe as a convenience wrapper and leave it to TargetLowering documentation to explain.	2024-07-04 19:05:54 -07:00

1 2 3 4 5 ...

3891 Commits