llvm-project

Author	SHA1	Message	Date
Simon Pilgrim	2b1dfd2b35	[DAG] Replace getValidShiftAmountConstant helpers with getValidShiftAmount helpers to support KnownBits analysis (#93182 ) The getValidShiftAmountConstant/getValidMinimumShiftAmountConstant/getValidMaximumShiftAmountConstant helpers only worked with constant shift amounts, which could be problematic after type legalization (e.g. v2i64 might be partially scalarized or split into v4i32 on some targets such as 32-bit x86, Thumb2 MVE). This patch proposes we generalize these helpers to work with ConstantRange+KnownBits if a scalar/buildvector constant isn't available. Most restrictions are the same - the helper fails if any shift amount is out of bounds, getValidShiftConstant must be a specific constant uniform etc. However, getValidMinimumShiftAmount/getValidMaximumShiftAmount now can return bounds values that aren't values in the actual data, as they are based off the common KnownBits of every vector element. This addresses feedback on #92096	2024-06-01 16:48:26 +01:00
Matt Arsenault	aef0bdd36d	DAG: Preserve flags when expanding fminimum/fmaximum (#93550 ) The operation selection logic here doesn't really work when vector types need to be split. This was also dropping the flags, and losing nnan made the combine from select back to fmin/fmax unrecoverable. Preserve the flags to assist a future commit.	2024-05-29 12:26:27 +02:00
Simon Pilgrim	8a395b00b8	[DAG] Use auto* for cast/dyn_cast results (style). NFC.	2024-05-28 10:03:04 +01:00
Craig Topper	a1c9b9673c	[SelectionDAG][RISCV][VE] Rename VP_ASHR->VP_SRA VP_LSHR->VP_SRL. (#93221 ) This maintains consistency with the non-VP ISD opcodes.	2024-05-24 09:03:19 -07:00
Simon Pilgrim	aefd2572a5	[DAG][X86] expandABD - add branchless abds/abdu expansion for 0/-1 comparison result cases (#92780 ) If the comparison results are allbits masks, we can expand as `abd(lhs, rhs) -> sub(cmpgt(lhs, rhs), xor(sub(lhs, rhs), cmpgt(lhs, rhs)))`, replacing a sub+sub+select pattern with the simpler sub+xor+sub pattern. This allows us to remove a lot of X86 specific legalization code, and will be useful in future generic expansion for the legalization work in #92576 Alive2: https://alive2.llvm.org/ce/z/sj863C	2024-05-23 11:07:35 +01:00
Yingwei Zheng	c8dc6b59d6	[SDAG] Improve `SimplifyDemandedBits` for mul (#90034 ) If the RHS is a constant with X trailing zeros, then the X MSBs of the LHS are not demanded. Alive2: https://alive2.llvm.org/ce/z/F5CyJW Fixes https://github.com/llvm/llvm-project/issues/56645.	2024-05-22 22:43:10 +08:00
Yingwei Zheng	cf128305bd	[SDAG] Don't treat ISD::SHL as a uniform binary operator in `ShrinkDemandedOp` (#92753 ) In `TargetLowering::ShrinkDemandedOp`, types of lhs and rhs may differ before legalization. In the original case, `VT` is `i64` and `SmallVT` is `i32`, but the type of rhs is `i8`. Then invalid truncate nodes will be created. See the description of ISD::SHL for further information: > After legalization, the type of the shift amount is known to be TLI.getShiftAmountTy(). Before legalization, the shift amount can be any type, but care must be taken to ensure it is large enough. `605ae4e93b/llvm/include/llvm/CodeGen/ISDOpcodes.h (L691-L712)` This patch stops handling ISD::SHL in `TargetLowering::ShrinkDemandedOp` and duplicates the logic in `TargetLowering::SimplifyDemandedBits`. Additionally, it adds some additional checks like `isNarrowingProfitable` and `isTypeDesirableForOp` to improve the codegen on AArch64. Fixes https://github.com/llvm/llvm-project/issues/92720.	2024-05-22 20:20:33 +08:00
Simon Pilgrim	bbc4c2e047	[DAG] SimplifyDemandedBits - ensure we have simplified the shift operands before folding to AVG Pulled out of #92096 - ensure we have completed a topological simplification of the SRA/SRL shift operands before we try to combine to a AVG node, as its difficult to later simplify through AVG nodes.	2024-05-22 11:55:03 +01:00
Simon Pilgrim	117d755b1b	[DAG] SimplifyDemandedBits - use ComputeKnownBits instead of getValidShiftAmountConstant to check for constant shift amounts. (#92412 ) This allows us to handle cases where the constant has already been type legalized behind a bitcast Despite calling ComputeKnownBits I'm not seeing any notable change in compile time.	2024-05-16 17:04:30 +01:00
Simon Pilgrim	311339e25c	[DAG] SimplifyDemandedBits - ISD::AND - only request DemandedElts when looking for a splat constant Limit the isConstOrConstSplat call to the vector elements we care about Noticed while investigating regressions in #92096	2024-05-16 13:05:35 +01:00
PiJoules	19008d3218	[llvm] Support fixed point multiplication on AArch64 (#84237 ) Prior to this, fixed point multiplication would lead to this assertion error on AArhc64, armv8, and armv7. ``` _Accum f(_Accum x, _Accum y) { return x * y; } // ./bin/clang++ -ffixed-point /tmp/test2.cc -c -S -o - -target aarch64 -O3 clang++: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:10245: void llvm::TargetLowering::forceExpandWideMUL(SelectionDAG &, const SDLoc &, bool, EVT, const SDValue, const SDValue, const SDValue, const SDValue, SDValue &, SDValue &) const: Assertion `Ret.getOpcode() == ISD::MERGE_VALUES && "Ret value is a collection of constituent nodes holding result."' failed. ``` This path into forceExpandWideMUL should only be taken if we don't support [US]MUL_LOHI or MULH[US] for the operand size (32 in this case). But we should also check if we can just leverage regular wide multiplication. That is, extend the operands from 32 to 64, do a regular 64-bit mul, then trunc and shift. These ops are certainly available on aarch64 but for wider types.	2024-05-14 11:23:45 -07:00
Matt Arsenault	6a8d30b1c1	DAG: Skip 0 sign handling in minimum/maximum lowering for _ieee case (#91326 ) dc9664a8adae17f2083fbcc8e96cfce606c56d57 changed the documentation to assume these order -0 as less than +0.	2024-05-09 14:41:13 +02:00
Jinsong Ji	2dade0041a	[Analysis] Attribute Range should not prevent tail call optimization (#91122 ) - Remove Range attr when comparing for tailcall - Add test for testcall with range	2024-05-07 22:02:10 -04:00
Min-Yih Hsu	539f626ecd	[VP][RISCV] Add vp.cttz.elts intrinsic and its RISC-V codegen (#90502 ) This intrinsic is the VP version of `experimental.cttz.elts`.	2024-04-30 09:27:10 -07:00
Qiu Chaofan	4a8f2f2e1a	[Legalizer] Expand fmaximum and fminimum (#67301 ) According to langref, llvm.maximum/minimum has -0.0 < +0.0 semantics and propagates NaN. Expand the nodes on targets not supporting the operation, by adding extra check for NaN and using is_fpclass to check zero signs.	2024-04-29 15:09:54 +08:00
fengfeng	36230f90ee	[SelectionDAG] Propagate Disjoint flag. (#88370 ) Signed-off-by: feng.feng <feng.feng@iluvatar.com>	2024-04-15 11:01:15 +02:00
Björn Pettersson	33e6b488be	[SelectionDAG] Fix and improve TargetLowering::SimplifySetCC (#87646 ) The load narrowing part of TargetLowering::SimplifySetCC is updated according to this: 1) The offset calculation (for big endian) did not work properly for non byte-sized types. This is basically solved by an early exit if the memory type isn't byte-sized. But the code is also corrected to use the store size when calculating the offset. 2) To still allow some optimizations for non-byte-sized types the TargetLowering::isPaddedAtMostSignificantBitsWhenStored hook is added. By default it assumes that scalar integer types are padded starting at the most significant bits, if the type needs padding when being stored to memory. 3) Allow optimizing when isPaddedAtMostSignificantBitsWhenStored is true, as that hook makes it possible for TargetLowering to know how the non byte-sized value is aligned in memory. 4) Update the algorithm to always search for a narrowed load with a power-of-2 byte-sized type. In the past the algorithm started with the the width of the original load, and then divided it by two for each iteration. But for a type such as i48 that would just end up trying to narrow the load into a i24 or i12 load, and then we would fail sooner or later due to not finding a newVT that fulfilled newVT.isRound(). With this new approach we can narrow the i48 load into either an i8, i16 or i32 load. By checking if such a load is allowed (e.g. alignment wise) for any "multiple of 8 offset", then we can find more opportunities for the optimization to trigger. So even for a byte-sized type such as i32 we may now end up narrowing the load into loading the 16 bits starting at offset 8 (if that is allowed by the target). The old algorithm did not even consider that case. 5) Also start using getObjectPtrOffset instead of getMemBasePlusOffset when creating the new ptr. This way we get "nsw" on the add.	2024-04-12 16:18:12 +02:00
Jay Foad	1b761205f2	[APInt] Add a simpler overload of multiplicativeInverse (#87610 ) The current APInt::multiplicativeInverse takes a modulus which can be any value, but all in-tree callers use a power of two. Moreover, most callers want to use two to the power of the width of an existing APInt, which is awkward because 2^N is not representable as an N-bit APInt. Add a new overload of multiplicativeInverse which implicitly uses 2^BitWidth as the modulus.	2024-04-04 16:11:06 +01:00
Simon Pilgrim	2d0087424f	[DAG] Remove extract_vector_elt(freeze(x)), idx -> freeze(extract_vector_elt(x), idx) fold (#87480 ) Reverse the fold with handling inside canCreateUndefOrPoison for cases where we know that the extract index is in bounds. This exposed a number or regressions, and required some initial freeze handling of SCALAR_TO_VECTOR, which will require us to properly improve demandedelts support to handle its undef upper elements. There is still one outstanding regression to be addressed in the future - how do we want to handle folds involving frozen loads? Fixes #86968	2024-04-04 11:10:55 +01:00
aniplcc	d650fcd6bf	[DAG] SimplifyDemandedVectorElts - add ISD::AVGCEILS/AVGCEILU/AVGFLOORS/AVGFLOORU nodes (#86284 ) Fixes #84768	2024-04-03 15:00:50 +01:00
Wang Pengcheng	610b9e23c5	[SDAG] Use shifts if ISD::MUL is illegal when lowering ISD::CTPOP (#86505 ) We can avoid libcalls. Fixes #86205	2024-03-29 15:38:39 +08:00
Owen Anderson	7c9b5228da	Only check assertions that were meant to apply to the normal case of non-splat vector SREM expansion when we aren't hitting the special case. (#86238 ) Fixes https://github.com/llvm/llvm-project/issues/84830 Introduced in https://github.com/llvm/llvm-project/pull/82706	2024-03-23 21:49:29 -05:00
Simon Pilgrim	e4fa2e3562	[DAG] isGuaranteedNotToBeUndefOrPoisonForTargetNode - add fallback implementation (#86125 ) Allow targets to rely on TargetLowering::isGuaranteedNotToBeUndefOrPoisonForTargetNode to test nodes for canCreateUndefOrPoisonForTargetNode + all arguments are isGuaranteedNotToBeUndefOrPoison. Targets can still perform this themselves for specific special case nodes (e.g. target shuffles). Matches the fallback in SelectionDAG::isGuaranteedNotToBeUndefOrPoison	2024-03-21 15:11:59 +00:00
Benjamin Kramer	5f5a64134b	Revert "[DAGCombiner] Simplifying `{si\|ui}tofp` when only signbit is needed" This reverts commit 353fbeb0a294d2c7cef6d88607fa0fd50ee81462. It crashes when it encounters an UINT_TO_FP. llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:1618 in SDValue llvm::SelectionDAG::getConstant(const ConstantInt &, const SDLoc &, EVT, bool, bool): VT.isInteger() && "Cannot create FP integer constant!"	2024-03-20 15:08:37 +01:00
Noah Goldstein	353fbeb0a2	[DAGCombiner] Simplifying `{si\|ui}tofp` when only signbit is needed If we only need the signbit `uitofp` simplified to 0, and `sitofp` simplifies to `bitcast`. Closes #85138	2024-03-19 17:17:35 -05:00
Craig Topper	23323e2837	[TargetLowering][RISCV] Propagate fastmath flags for the vector operations emitted in expandVecReduce. (#85164 ) We used the fastmath flags for any scalar ops created, but not vector.	2024-03-14 08:39:32 -07:00
Arthur Eubanks	94c988bcfd	[NFC] Remove unused parameter from shouldAssumeDSOLocal()	2024-03-11 19:48:17 +00:00
Noah Goldstein	61c06775c9	[KnownBits] Add API for `nuw` flag in `computeForAddSub`; NFC	2024-03-05 12:59:58 -06:00
Owen Anderson	2c5a68858b	Fix non-splat vector SREM expansion when one of the divisors is a power of two. (#82706 ) The expansion previously used, derived from Hacker's Delight, does not work correctly when the dividend is INT_MIN and the divisor is a power of two. We now use an alternate derivation of the A and Q constants specifically for the power-of-two divisor case to avoid this problem. Credit to Fabian Giesen for the new derivation. Fixes https://github.com/llvm/llvm-project/issues/77169	2024-02-25 10:13:05 -05:00
David Majnemer	be36812fb7	[TargetLowering] Be more efficient in fp -> bf16 NaN conversions We can avoid masking completely as it is OK (and probably preferable) to bring over some of the existant NaN payload.	2024-02-21 22:47:27 +00:00
David Majnemer	9eff001d3d	[TargetLowering] Correctly yield NaN from FP_TO_BF16 We didn't set the exponent field, resulting in tiny numbers instead of NaNs.	2024-02-21 22:17:02 +00:00
David Majnemer	ddc0f1d8fe	[TargetLowering] Actually add the adjustment to the significand The logic was supposed to be choosing between {0, 1, -1} as an adjustment to the FP bit pattern. However, the adjustment itself was used as the bit pattern instead which result in garbage results.	2024-02-21 19:34:11 +00:00
David Majnemer	cc13f3ba45	Correctly round FP -> BF16 when SDAG expands such nodes (#82399 ) We did something pretty naive: - round FP64 -> BF16 by first rounding to FP32 - skip FP32 -> BF16 rounding entirely - taking the top 16 bits of a FP32 which will turn some NaNs into infinities Let's do this in a more principled way by rounding types with more precision than FP32 to FP32 using round-inexact-to-odd which will negate double rounding issues.	2024-02-21 12:37:02 -05:00
Craig Topper	d485317357	[TargetLowering] Emit SIGN_EXTEND_INREG instead of shift pair from optimizeSetCCOfSignedTruncationCheck. (#81785 ) sext_inreg is our canonical form of shift pair before op legalization so DAG combiner will probably create it anyway. If it isn't legal LegalizeDAG will expand to shifts later.	2024-02-15 09:24:02 -08:00
David Green	2e3de997ab	[DAG] Generalize setcc(setcc) fold to use known bits. If we have a `SETCC (SETCC), 0, NE` and ZeroOrOneBooleanContent, we can remove the outer setcc as it will produce the same value as the inner. This can be generalized to anything where the top bits are known to be 0, as the value will remain as 1 or 0.	2024-02-06 12:39:48 +00:00
Craig Topper	f72da9f4fd	[SelectionDAG] Use getShiftAmountConstant to simplify code. NFC (#80561 ) Replace calls to getShiftAmountTy+getConstant with getShiftAmountContant.	2024-02-04 16:05:14 -08:00
Kazu Hirata	39fa304866	[llvm] Use StringRef::starts_with (NFC)	2024-01-31 23:54:07 -08:00
PiJoules	a356e6ccad	[SelectionDAG] Expand fixed point multiplication into libcall (#79352 ) 32-bit ARMv6 with thumb doesn't support MULHS/MUL_LOHI as legal/custom nodes during expansion which will cause fixed point multiplication of _Accum types to fail with fixed point arithmetic. Prior to this, we just happen to use fixed point multiplication on platforms that happen to support these MULHS/MUL_LOHI. This patch attempts to check if the multiplication can be done via libcalls, which are provided by the arm runtime. These libcall attempts are made elsewhere, so this patch refactors that libcall logic into its own functions and the fixed point expansion calls and reuses that logic.	2024-01-30 13:58:55 -08:00
Philip Reames	0fc5f4b524	[DAG] Set nneg flag when forming zext in demanded bits (#72281 ) We do the same for the analogous transform in DAGCombine, but this case was missed in the recent patch which added support for zext nneg. Sorry for the lack of test coverage. Not sure how to exercise this piece of logic. It appears to have only minimal impact on LIT tests (only test/CodeGen/X86/wide-scalar-shift-by-byte-multiple-legalization.ll), and even then, the changes without it appear uninteresting. Maybe we should remove this transform instead?	2024-01-18 07:34:08 -08:00
Alex Bradbury	2d54ec36f7	[SelectionDAG] Add and use SDNode::getAsAPIntVal() helper (#77455 ) This is the logical equivalent for #76710 for APInt and uses the same naming scheme. Converted existing users through: `git grep -l "cast<ConstantSDNode>\(.\).getAPIntValueValue" \| xargs sed -E -i 's/cast<ConstantSDNode>\((.*)\)->getAPIntValue/\1->getAsAPIntVal/'`	2024-01-09 14:27:07 +00:00
Simon Pilgrim	d460c1de3b	[DAG] SimplifyDemandedBits - don't fold sext(x) -> aext(x) if we lose an 0/-1 allsignbits mask (#77296 ) For targets that use 0/-1 boolean results, we want to keep this pattern through extensions/truncations as much as possible - so avoid simplifying to any_extend even if we don't demand the upper bits. Noticed in triage for https://reviews.llvm.org/D152928	2024-01-08 18:01:41 +00:00
Simon Pilgrim	f45b75949d	[DAG] SimplifyDemandedBits - call demanded elts variant directly for SELECT/SELECT_CC nodes. Don't rebuild the demanded elts mask every time.	2024-01-04 10:53:45 +00:00
Simon Pilgrim	72db578d71	[DAG] Fix typo in VSELECT SimplifyDemandedVectorElts handling. NFC. Rename UndefZero -> UndefSel (undefined elements from Sel operand).	2024-01-04 10:50:42 +00:00
David Green	771fd1ad2a	[DAG] Extend input types if needed in combineShiftToAVG. (#76791 ) This atempts to fix #76734 which is a crash in invalid TRUNC nodes types from unoptimized input code in combineShiftToAVG. The NVT can be VT if the larger type was legal and the adds will not overflow, in which case the inputs should be extended. From what I can tell this appears to be valid (if not optimal for this case): https://alive2.llvm.org/ce/z/fRieHR The result has also been changed to getExtOrTrunc in case that VT==NVT, which is not handled by SEXT/ZEXT.	2024-01-03 10:52:01 +00:00
Craig Topper	bbd57e1832	[SelectionDAG] Add initial plumbing for the disjoint flag. (#76751 ) This copies the flag from IR to the SDNode in SelectionDAGBuilder, clears the flag in SimplifyDemandedBits, and adds it to canCreateUndefOrPoison. Uses of the flag will come in later patches.	2024-01-02 21:58:00 -08:00
Sander de Smalen	81b7f115fb	[llvm][TypeSize] Fix addition/subtraction in TypeSize. (#72979 ) It seems TypeSize is currently broken in the sense that: TypeSize::Fixed(4) + TypeSize::Scalable(4) => TypeSize::Fixed(8) without failing its assert that explicitly tests for this case: assert(LHS.Scalable == RHS.Scalable && ...); The reason this fails is that `Scalable` is a static method of class TypeSize, and LHS and RHS are both objects of class TypeSize. So this is evaluating if the pointer to the function Scalable == the pointer to the function Scalable, which is always true because LHS and RHS have the same class. This patch fixes the issue by renaming `TypeSize::Scalable` -> `TypeSize::getScalable`, as well as `TypeSize::Fixed` to `TypeSize::getFixed`, so that it no longer clashes with the variable in FixedOrScalableQuantity. The new methods now also better match the coding standard, which specifies that: * Variable names should be nouns (as they represent state) * Function names should be verb phrases (as they represent actions)	2023-11-22 08:52:53 +00:00
Simon Pilgrim	98efa8f9aa	[DAG] Fix ShrinkDemandedOp doxygen description to match behaviour. NFC. ShrinkDemandedOp checks for both isTruncateFree AND isZExtFree but extends with ANY_EXTEND.	2023-11-18 22:44:08 +00:00
Tavian Barnes	75cf672b12	[SDAG] Simplify is-power-of-2 codegen (#72275 ) When x is not known to be nonzero, ctpop(x) == 1 is expanded to x != 0 && (x & (x - 1)) == 0 resulting in codegen like leal -1(%rdi), %eax testl %eax, %edi sete %cl testl %edi, %edi setne %al andb %cl, %al But another expression that works is (x ^ (x - 1)) > x - 1 which has nicer codegen: leal -1(%rdi), %eax xorl %eax, %edi cmpl %eax, %edi seta %al	2023-11-15 22:26:34 +09:00
Yingwei Zheng	650026897c	[RISCV][SDAG] Prefer ShortForwardBranch to lower sdiv by pow2 (#67364 ) This patch lowers `sdiv x, +/-2k` to `add + select + shift` when the short forward branch optimization is enabled. The latter inst seq performs faster than the seq generated by target-independent DAGCombiner. This algorithm is described in Hacker's Delight**. This patch also removes duplicate logic in the X86 and AArch64 backend. But we cannot do this for the PowerPC backend since it generates a special instruction `addze`.	2023-11-10 21:38:47 +08:00
Craig Topper	70b35ec0a8	[SelectionDAG] Add initial support for nneg flag on ISD::ZERO_EXTEND. (#70872 ) This adds the nneg flag to SDNodeFlags and the node printing code. SelectionDAGBuilder will add this flag to the node if the target doesn't prefer sign extend. A future RISC-V patch can remove the sign extend preference from SelectionDAGBuilder. I've also added the flag to the DAG combine that converts ISD::SIGN_EXTEND to ISD::ZERO_EXTEND.	2023-11-03 11:15:08 -07:00

1 2 3 4 5 ...

1468 Commits