llvm-project

Author	SHA1	Message	Date
Simon Pilgrim	13d04fa560	[DAG] Add legalization handling for ABDS/ABDU (#92576 ) (REAPPLIED) Always match ABD patterns pre-legalization, and use TargetLowering::expandABD to expand again during legalization. abdu(lhs, rhs) -> sub(xor(sub(lhs, rhs), usub_overflow(lhs, rhs)), usub_overflow(lhs, rhs)) Alive2: https://alive2.llvm.org/ce/z/dVdMyv REAPPLIED: Fix regression issue with "abs(ext(x) - ext(y)) -> zext(abd(x, y))" fold failing after type legalization	2024-08-08 11:39:05 +01:00
cceerczw	6f8e8faa12	[TargetLowering] Fix the problem of emulated-TLS implementation witho… (#101490 ) For a __thread variable x, when emulated TLS is enabled and there is an access to x, the compiler first looks up the symbol __emutls_v.x within the module. However, the issue arises with an alias y of x, the compiler still tries to look up __emutls_v.y instead of __emutls_v.x. As a result, the lookup returns a nullptr, causing the compiler to crash. The purpose of this MR (Merge Request) is to ensure that in emulated TLS, before checking __emutls_v.y, the compiler first identifies which global value y is an alias of.	2024-08-07 21:56:48 +04:00
Simon Pilgrim	e4e96b3e26	Revert b1234ddbe2652aa7948242a57107ca7ab12fd2f8. "[DAG] Add legalization handling for ABDS/ABDU (#92576 )" Reverting #92576 while we identify a reported regression	2024-08-07 17:11:25 +01:00
Simon Pilgrim	b1234ddbe2	[DAG] Add legalization handling for ABDS/ABDU (#92576 ) Always match ABD patterns pre-legalization, and use TargetLowering::expandABD to expand again during legalization. abdu(lhs, rhs) -> sub(xor(sub(lhs, rhs), usub_overflow(lhs, rhs)), usub_overflow(lhs, rhs)) Alive2: https://alive2.llvm.org/ce/z/dVdMyv	2024-08-06 10:18:06 +01:00
Sergei Barannikov	4527fba9ad	Revert "[SDag][ARM][RISCV] Allow lowering CTPOP into a libcall" (#101740 ) Reverts the rest of llvm/llvm-project#99752	2024-08-03 01:51:26 +03:00
Fangrui Song	0b92e70dfb	Revert "[AMDGPU] Always lower s/udiv64 by constant to MUL (#100723 )" This reverts commit 92fbc963a51683d32f70d0c7f3783bb13983f08d. The patch also affected ARM and caused an assertion failure during CurDAG->Legalize (https://github.com/llvm/llvm-project/pull/100723#issuecomment-2266154211).	2024-08-02 14:43:36 -07:00
Pierre van Houtryve	92fbc963a5	[AMDGPU] Always lower s/udiv64 by constant to MUL (#100723 ) Solves #100383	2024-08-02 12:22:42 +02:00
Sergei Barannikov	92e18ffd80	[SDag][ARM][RISCV] Allow lowering CTPOP into a libcall (#99752 ) The main change is adding CTPOP to `RuntimeLibcalls.def` to allow targets to use LibCall action for CTPOP. DAG legalizers are changed accordingly.	2024-08-02 12:29:39 +03:00
Julius Alexandre	7231776a02	Recommit "[DAG] Reducing instructions by better legalization handling of AVGFLOORU for illegal data types" (#101223 ) Previous reverted merge: https://github.com/llvm/llvm-project/pull/99913 Original message: Issue: https://github.com/rust-lang/rust/issues/124790 Previous PR: https://github.com/llvm/llvm-project/pull/99614 https://rust.godbolt.org/z/T7eKP3Tvo Aarch64: https://alive2.llvm.org/ce/z/dqr2Kg x86: https://alive2.llvm.org/ce/z/ze88Hw	2024-07-30 19:00:46 -07:00
Craig Topper	fed94333fd	Revert "[DAG] Reducing instructions by better legalization handling of AVGFLOORU for illegal data types (#99913 )" This reverts commit d5521d128494690be66e03a674b9d1181935bf77. The AArch64 test is failing on the bots.	2024-07-27 18:35:44 -07:00
Julius Alexandre	d5521d1284	[DAG] Reducing instructions by better legalization handling of AVGFLOORU for illegal data types (#99913 ) Issue: https://github.com/rust-lang/rust/issues/124790 Previous PR: https://github.com/llvm/llvm-project/pull/99614 https://rust.godbolt.org/z/T7eKP3Tvo Aarch64: https://alive2.llvm.org/ce/z/dqr2Kg x86: https://alive2.llvm.org/ce/z/ze88Hw cc: @RKSimon @topperc	2024-07-27 17:33:09 -07:00
Matt Arsenault	361d4cf533	DAG: Lower is.fpclass fcSubnormal\|fcZero to fabs(x) < smallest_normal (#100390 ) Produces better code on x86_64 only in the unordered case. Not sure what the exact condition should be to avoid the regression. Free fabs might do it, or maybe requires legality checks for the alternative integer expansion.	2024-07-26 22:45:47 +04:00
AtariDreams	871740761f	[CodeGen] Remove checks for vectors in unsigned division prior to computing leading zeros (#99524 ) It turns out we can safely use DAG.computeKnownBits(N0).countMinLeadingZeros() with constant legal vectors, so remove the check for it.	2024-07-19 12:15:36 +08:00
AtariDreams	a51f343b43	[CodeGen] Emit more efficient magic numbers for exact udivs (#87161 ) Have simpler lowering for exact udivs in both SelectionDAG and GlobalISel. The algorithm is the same between unsigned exact divs and signed divs save for arithmetic vs logical shift for even divisors, according to Hacker's Delight, 2nd Edition, page 242.	2024-07-17 12:19:02 -07:00
Lawrence Benson	177ce1900f	[LLVM] Add `llvm.experimental.vector.compress` intrinsic (#92289 ) This PR adds a new vector intrinsic `@llvm.experimental.vector.compress` to "compress" data within a vector based on a selection mask, i.e., it moves all selected values (i.e., where `mask[i] == 1`) to consecutive lanes in the result vector. A `passthru` vector can be provided, from which remaining lanes are filled. The main reason for this is that the existing `@llvm.masked.compressstore` has very strong constraints in that it can only write values that were selected, resulting in guard branches for all targets except AVX-512 (and even there the AMD implementation is _very_ slow). More instruction sets support "compress" logic, but only within registers. So to store the values, an additional store is needed. But this combination is likely significantly faster on many target as it avoids branches. In follow up PRs, my plan is to add target-specific lowerings for x86, SVE, and possibly RISCV. I also want to combine this with a store instruction, as this is probably a common case and we can avoid some memory writes in that case. See [discussion in forum](https://discourse.llvm.org/t/new-intrinsic-for-masked-vector-compress-without-store/78663) for initial discussion on the design.	2024-07-17 14:24:24 +02:00
Volodymyr Vasylkun	e094abde42	[SelectionDAG] Expand [US]CMP using arithmetic on boolean values instead of selects (#98774 ) The previous expansion of [US]CMP was done using two selects and two compares. It produced decent code, but on many platforms it is better to implement [US]CMP nodes by performing the following operation: ``` [us]cmp(x, y) = (x [us]> y) - (x [us]< y) ``` This patch adds this new expansion, as well as a hook in TargetLowering to allow some targets to still use the select-based approach. AArch64 and SystemZ are currently the only targets to prefer the former approach, but other targets may also start to use it if it provides for better codegen.	2024-07-16 20:56:18 +01:00
Froster	c8dc21d77f	[SelectionDAG][RISCV] Fix break of vnsrl pattern in issue #94265 (#95563 ) Added a RISCV overload of `isTruncateFree` to fix the break of vnsrl described in issue #94265. Fixes #94265	2024-07-14 12:09:37 +01:00
Dmitry Borisenkov	a38d5e0632	[SelectionDAG] Use LAST_INTEGER_VALUETYPE instead of i64 (#98299 ) When looking for a largest legal integer type for a target `TargetLowering::findOptimalMemOpLowering` assumes that `MVT::i64` is the largets possible integer type. The patch removes this assumption and uses `MVT::LAST_INTEGER_VALUETYPE` instead.	2024-07-10 21:38:50 +04:00
Craig Topper	8419da8bd4	[SelectionDAG] Remove LegalTypes argument from getShiftAmountConstant. (#97653 ) #97645 proposed to remove LegalTypes from getShiftAmountTy. This patches removes it from getShiftAmountConstant which is one of the callers of getShiftAmountTy.	2024-07-04 18:33:25 -07:00
Craig Topper	3141c11fe8	[SelectionDAG] Remove LegalTypes argument from getShiftAmountTy. NFC (#97757 ) This argument is no longer used inside the function. Remove it from the interface.	2024-07-04 15:24:54 -07:00
Simon Pilgrim	92715cf43b	[DAG] expandAVG - attempt to extend to a wider integer type for the add/shift to avoid overflow handling (#95788 )	2024-06-26 13:33:09 +01:00
Nikita Popov	f2f18459d4	Revert "Intrinsic: introduce minimumnum and maximumnum (#93841 )" As far as I can tell, this pull request was not approved, and did not go through an RFC on discourse. This reverts commit 89881480030f48f83af668175b70a9798edca2fb. This reverts commit 225d8fc8eb24fb797154c1ef6dcbe5ba033142da.	2024-06-21 08:34:04 +02:00
YunQiang Su	8988148003	Intrinsic: introduce minimumnum and maximumnum (#93841 ) Currently, on different platform, the behaivor of llvm.minnum is different if one operand is sNaN: When we compare sNaN vs NUM: ARM/AArch64/PowerPC: follow the IEEE754-2008's minNUM: return qNaN. RISC-V/Hexagon follow the IEEE754-2019's minimumNumber: return NUM. X86: Returns NUM but not same with IEEE754-2019's minimumNumber as +0.0 is not always greater than -0.0. MIPS/LoongArch/Generic: return NUM. LIBCALL: returns qNaN. So, let's introduce llvm.minmumnum/llvm.maximumnum, which always follow IEEE754-2019's minimumNumber/maximumNumber. Half-fix: #93033	2024-06-21 11:53:08 +08:00
Poseydon42	995835fe6d	[SelectionDAG] Add support for the 3-way comparison intrinsics [US]CMP (#91871 ) This PR adds initial support for the `scmp`/`ucmp` 3-way comparison intrinsics in the SelectionDAG. Some of the expansions/lowerings are not optimal yet.	2024-06-17 11:16:52 +02:00
Simon Pilgrim	76c5158aed	[DAG] combineShiftToAVG - don't create avgfloor with scalar constant operands unless legal. Converting to avgfloor and then expanding it back to shift+add later is likely to prevent other folds (re-association and value-tracking in particular) in the meantime. Fixes #95284	2024-06-13 12:37:43 +01:00
Simon Pilgrim	ca33796d54	[DAG] combineShiftToAVG - only create new types before LegalTypes Fixes #95271	2024-06-12 18:49:49 +01:00
Simon Pilgrim	760ad23e48	[DAG] combineShiftToAVG - ensure the reduced demanded value type is smaller than the original. Now we have promotion support we should be able to remove the next-power-of-2 code entirely, but this is good enough for now.	2024-06-12 18:13:30 +01:00
Simon Pilgrim	ea2ee5dc2f	[DAG] Add legalization handling for AVGCEIL/AVGFLOOR nodes (#92096 ) Always match AVG patterns pre-legalization, and use TargetLowering::expandAVG to expand again during legalization. I've removed the X86 custom AVGCEILU pattern detection and replaced with combines to try and convert other AVG nodes to AVGCEILU.	2024-06-12 14:11:07 +01:00
c8ef	b25b1db819	[KnownBits] Remove `hasConflict()` assertions (#94568 ) Allow KnownBits to represent "always poison" values via conflict. close: #94436	2024-06-07 17:01:22 +02:00
Matt Arsenault	212b78aad4	DAG: Improve fminimum/fmaximum vector expansion logic (#93579 ) First, expandFMINIMUM_FMAXIMUM should be a never-fail API. The client wanted it expanded, and it can always be expanded. This logic was tied up with what the VectorLegalizer wanted. Prefer using the min/max opcodes, and unrolling if we don't have a vselect. This seems to produce better code in all the changed tests.	2024-06-06 19:01:39 +02:00
Simon Pilgrim	8725b67207	[DAG] expandABS - add missing FREEZE in abs(x) -> smax(x,sub(0,x)) expansion Noticed while working on #94601	2024-06-06 13:26:17 +01:00
Simon Pilgrim	2b1dfd2b35	[DAG] Replace getValidShiftAmountConstant helpers with getValidShiftAmount helpers to support KnownBits analysis (#93182 ) The getValidShiftAmountConstant/getValidMinimumShiftAmountConstant/getValidMaximumShiftAmountConstant helpers only worked with constant shift amounts, which could be problematic after type legalization (e.g. v2i64 might be partially scalarized or split into v4i32 on some targets such as 32-bit x86, Thumb2 MVE). This patch proposes we generalize these helpers to work with ConstantRange+KnownBits if a scalar/buildvector constant isn't available. Most restrictions are the same - the helper fails if any shift amount is out of bounds, getValidShiftConstant must be a specific constant uniform etc. However, getValidMinimumShiftAmount/getValidMaximumShiftAmount now can return bounds values that aren't values in the actual data, as they are based off the common KnownBits of every vector element. This addresses feedback on #92096	2024-06-01 16:48:26 +01:00
Matt Arsenault	aef0bdd36d	DAG: Preserve flags when expanding fminimum/fmaximum (#93550 ) The operation selection logic here doesn't really work when vector types need to be split. This was also dropping the flags, and losing nnan made the combine from select back to fmin/fmax unrecoverable. Preserve the flags to assist a future commit.	2024-05-29 12:26:27 +02:00
Simon Pilgrim	8a395b00b8	[DAG] Use auto* for cast/dyn_cast results (style). NFC.	2024-05-28 10:03:04 +01:00
Craig Topper	a1c9b9673c	[SelectionDAG][RISCV][VE] Rename VP_ASHR->VP_SRA VP_LSHR->VP_SRL. (#93221 ) This maintains consistency with the non-VP ISD opcodes.	2024-05-24 09:03:19 -07:00
Simon Pilgrim	aefd2572a5	[DAG][X86] expandABD - add branchless abds/abdu expansion for 0/-1 comparison result cases (#92780 ) If the comparison results are allbits masks, we can expand as `abd(lhs, rhs) -> sub(cmpgt(lhs, rhs), xor(sub(lhs, rhs), cmpgt(lhs, rhs)))`, replacing a sub+sub+select pattern with the simpler sub+xor+sub pattern. This allows us to remove a lot of X86 specific legalization code, and will be useful in future generic expansion for the legalization work in #92576 Alive2: https://alive2.llvm.org/ce/z/sj863C	2024-05-23 11:07:35 +01:00
Yingwei Zheng	c8dc6b59d6	[SDAG] Improve `SimplifyDemandedBits` for mul (#90034 ) If the RHS is a constant with X trailing zeros, then the X MSBs of the LHS are not demanded. Alive2: https://alive2.llvm.org/ce/z/F5CyJW Fixes https://github.com/llvm/llvm-project/issues/56645.	2024-05-22 22:43:10 +08:00
Yingwei Zheng	cf128305bd	[SDAG] Don't treat ISD::SHL as a uniform binary operator in `ShrinkDemandedOp` (#92753 ) In `TargetLowering::ShrinkDemandedOp`, types of lhs and rhs may differ before legalization. In the original case, `VT` is `i64` and `SmallVT` is `i32`, but the type of rhs is `i8`. Then invalid truncate nodes will be created. See the description of ISD::SHL for further information: > After legalization, the type of the shift amount is known to be TLI.getShiftAmountTy(). Before legalization, the shift amount can be any type, but care must be taken to ensure it is large enough. `605ae4e93b/llvm/include/llvm/CodeGen/ISDOpcodes.h (L691-L712)` This patch stops handling ISD::SHL in `TargetLowering::ShrinkDemandedOp` and duplicates the logic in `TargetLowering::SimplifyDemandedBits`. Additionally, it adds some additional checks like `isNarrowingProfitable` and `isTypeDesirableForOp` to improve the codegen on AArch64. Fixes https://github.com/llvm/llvm-project/issues/92720.	2024-05-22 20:20:33 +08:00
Simon Pilgrim	bbc4c2e047	[DAG] SimplifyDemandedBits - ensure we have simplified the shift operands before folding to AVG Pulled out of #92096 - ensure we have completed a topological simplification of the SRA/SRL shift operands before we try to combine to a AVG node, as its difficult to later simplify through AVG nodes.	2024-05-22 11:55:03 +01:00
Simon Pilgrim	117d755b1b	[DAG] SimplifyDemandedBits - use ComputeKnownBits instead of getValidShiftAmountConstant to check for constant shift amounts. (#92412 ) This allows us to handle cases where the constant has already been type legalized behind a bitcast Despite calling ComputeKnownBits I'm not seeing any notable change in compile time.	2024-05-16 17:04:30 +01:00
Simon Pilgrim	311339e25c	[DAG] SimplifyDemandedBits - ISD::AND - only request DemandedElts when looking for a splat constant Limit the isConstOrConstSplat call to the vector elements we care about Noticed while investigating regressions in #92096	2024-05-16 13:05:35 +01:00
PiJoules	19008d3218	[llvm] Support fixed point multiplication on AArch64 (#84237 ) Prior to this, fixed point multiplication would lead to this assertion error on AArhc64, armv8, and armv7. ``` _Accum f(_Accum x, _Accum y) { return x * y; } // ./bin/clang++ -ffixed-point /tmp/test2.cc -c -S -o - -target aarch64 -O3 clang++: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:10245: void llvm::TargetLowering::forceExpandWideMUL(SelectionDAG &, const SDLoc &, bool, EVT, const SDValue, const SDValue, const SDValue, const SDValue, SDValue &, SDValue &) const: Assertion `Ret.getOpcode() == ISD::MERGE_VALUES && "Ret value is a collection of constituent nodes holding result."' failed. ``` This path into forceExpandWideMUL should only be taken if we don't support [US]MUL_LOHI or MULH[US] for the operand size (32 in this case). But we should also check if we can just leverage regular wide multiplication. That is, extend the operands from 32 to 64, do a regular 64-bit mul, then trunc and shift. These ops are certainly available on aarch64 but for wider types.	2024-05-14 11:23:45 -07:00
Matt Arsenault	6a8d30b1c1	DAG: Skip 0 sign handling in minimum/maximum lowering for _ieee case (#91326 ) dc9664a8adae17f2083fbcc8e96cfce606c56d57 changed the documentation to assume these order -0 as less than +0.	2024-05-09 14:41:13 +02:00
Jinsong Ji	2dade0041a	[Analysis] Attribute Range should not prevent tail call optimization (#91122 ) - Remove Range attr when comparing for tailcall - Add test for testcall with range	2024-05-07 22:02:10 -04:00
Min-Yih Hsu	539f626ecd	[VP][RISCV] Add vp.cttz.elts intrinsic and its RISC-V codegen (#90502 ) This intrinsic is the VP version of `experimental.cttz.elts`.	2024-04-30 09:27:10 -07:00
Qiu Chaofan	4a8f2f2e1a	[Legalizer] Expand fmaximum and fminimum (#67301 ) According to langref, llvm.maximum/minimum has -0.0 < +0.0 semantics and propagates NaN. Expand the nodes on targets not supporting the operation, by adding extra check for NaN and using is_fpclass to check zero signs.	2024-04-29 15:09:54 +08:00
fengfeng	36230f90ee	[SelectionDAG] Propagate Disjoint flag. (#88370 ) Signed-off-by: feng.feng <feng.feng@iluvatar.com>	2024-04-15 11:01:15 +02:00
Björn Pettersson	33e6b488be	[SelectionDAG] Fix and improve TargetLowering::SimplifySetCC (#87646 ) The load narrowing part of TargetLowering::SimplifySetCC is updated according to this: 1) The offset calculation (for big endian) did not work properly for non byte-sized types. This is basically solved by an early exit if the memory type isn't byte-sized. But the code is also corrected to use the store size when calculating the offset. 2) To still allow some optimizations for non-byte-sized types the TargetLowering::isPaddedAtMostSignificantBitsWhenStored hook is added. By default it assumes that scalar integer types are padded starting at the most significant bits, if the type needs padding when being stored to memory. 3) Allow optimizing when isPaddedAtMostSignificantBitsWhenStored is true, as that hook makes it possible for TargetLowering to know how the non byte-sized value is aligned in memory. 4) Update the algorithm to always search for a narrowed load with a power-of-2 byte-sized type. In the past the algorithm started with the the width of the original load, and then divided it by two for each iteration. But for a type such as i48 that would just end up trying to narrow the load into a i24 or i12 load, and then we would fail sooner or later due to not finding a newVT that fulfilled newVT.isRound(). With this new approach we can narrow the i48 load into either an i8, i16 or i32 load. By checking if such a load is allowed (e.g. alignment wise) for any "multiple of 8 offset", then we can find more opportunities for the optimization to trigger. So even for a byte-sized type such as i32 we may now end up narrowing the load into loading the 16 bits starting at offset 8 (if that is allowed by the target). The old algorithm did not even consider that case. 5) Also start using getObjectPtrOffset instead of getMemBasePlusOffset when creating the new ptr. This way we get "nsw" on the add.	2024-04-12 16:18:12 +02:00
Jay Foad	1b761205f2	[APInt] Add a simpler overload of multiplicativeInverse (#87610 ) The current APInt::multiplicativeInverse takes a modulus which can be any value, but all in-tree callers use a power of two. Moreover, most callers want to use two to the power of the width of an existing APInt, which is awkward because 2^N is not representable as an N-bit APInt. Add a new overload of multiplicativeInverse which implicitly uses 2^BitWidth as the modulus.	2024-04-04 16:11:06 +01:00
Simon Pilgrim	2d0087424f	[DAG] Remove extract_vector_elt(freeze(x)), idx -> freeze(extract_vector_elt(x), idx) fold (#87480 ) Reverse the fold with handling inside canCreateUndefOrPoison for cases where we know that the extract index is in bounds. This exposed a number or regressions, and required some initial freeze handling of SCALAR_TO_VECTOR, which will require us to properly improve demandedelts support to handle its undef upper elements. There is still one outstanding regression to be addressed in the future - how do we want to handle folds involving frozen loads? Fixes #86968	2024-04-04 11:10:55 +01:00

1 2 3 4 5 ...

1499 Commits