llvm-project

Author	SHA1	Message	Date
Yingwei Zheng	2709366f75	[DAGCombiner] Don't ignore N2's undef elements in `foldVSelectOfConstants` (#129272 ) Since N2 will be reused in the fold, we cannot skip N2's undef elements if the corresponding element in N1 is well-defined. For example: ``` t2: v4i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0> t24: v4i32 = BUILD_VECTOR undef:i32, undef:i32, Constant:i32<1>, undef:i32 t11: v4i32 = vselect t8, t2, t10 ``` Before this patch, we fold t11 into: ``` t26: v4i32 = sign_extend t8 t27: v4i32 = add t26, t24 ``` The last element of t27 is incorrect. Closes https://github.com/llvm/llvm-project/issues/129181.	2025-03-01 20:21:28 +08:00
Simon Pilgrim	bae41127e2	[DAG] replaceShuffleOfInsert - add support for shuffle_vector(scalar_to_vector(x),y) -> insert_vector_elt(y,x,c) (#127210 ) Begin extending replaceShuffleOfInsert to handle other forms of scalar insertion into a vector. I've limited this to targets that have Custom/Legal ISD::INSERT_VECTOR_ELT handling for now - although we can probably always fold this before LegalOperations.	2025-02-27 08:41:58 +00:00
Daniel Thornburgh	02128342d2	Revert "DAG: Preserve range metadata when load is narrowed" (#128948 ) Reverts llvm/llvm-project#128144 Breaks clang prod x64 build (seen in Fuchsia toolchain)	2025-02-26 14:14:55 -08:00
LU-JOHN	d8bcb53780	DAG: Preserve range metadata when load is narrowed (#128144 ) In DAGCombiner.cpp preserve range metadata when load is narrowed to load LSBs if original range metadata bounds can fit in the narrower type. Utilize preserved range metadata to reduce 64-bit shl to 32-bit shl. --------- Signed-off-by: John Lu <John.Lu@amd.com>	2025-02-26 15:40:49 +07:00
Vikash Gupta	352c48f278	[SelectionDAG] Utilizing target hook convertSelectOfConstantsToMath for SelectwithConstant (#127599 ) The Target hook convertSelectOfConstantsToMath() needs to be used within SimplifySelectCC helper combine function in SelectionDAG Isel, where generic select folding with constants is happening into simple maths op using the condition as it is. It necessarily fixes #121145.	2025-02-25 20:32:24 +05:30
Yingwei Zheng	44d1dbd24c	[X86][DAGCombiner] Skip x87 fp80 values in `combineFMulOrFDivWithIntPow2` (#128618 ) f80 is not a valid IEEE floating-point type. Closes https://github.com/llvm/llvm-project/issues/128528.	2025-02-25 22:03:17 +08:00
Yingwei Zheng	646e4f2eed	[DAGCombiner] visitFREEZE: Early exit when N is deleted (#128161 ) `N` may get merged with existing nodes inside the loop. Early exit when it is deleted to avoid the crash. Alternative solution: use `DAGNodeDeletedListener` to refresh the value of N. Closes https://github.com/llvm/llvm-project/issues/128143.	2025-02-22 12:06:34 +08:00
Piotr Fusik	8b58cb853a	[SelectionDAG][NFC] Refactor duplicate code into SDNode::bitcastToAPInt() (#127503 )	2025-02-20 13:23:00 +07:00
zhijian lin	1ac0db44fd	[NFC] using isUndef() instead of getOpcode() == ISD::UNDEF (#127713 ) [NFC] using isUndef() instead of getOpcode() == ISD::UNDEF	2025-02-19 08:42:38 -05:00
Craig Topper	ef9f0b3c41	[DAGCombiner] Don't peek through truncates of shift amounts in takeInexpensiveLog2. (#126957 ) Shift amounts in SelectionDAG don't have to match the result type of the shift. SelectionDAGBuilder will aggressively truncate shift amounts to the target's preferred type. This may result in a zero extend that existed in IR being removed. If we look through a truncate here, we can't guarantee the upper bits of the truncate input are zero. There may have been a zext that was removed. Unfortunately, this regresses tests where no truncate was involved. The only way I can think to fix this is to add an assertzext when SelectionDAGBuilder truncates a shift amount or remove the early truncation of shift amounts from SelectionDAGBuilder all together. Fixes #126889.	2025-02-17 20:26:05 -08:00
Jim Lin	31bfae35d2	[DAGCombiner] Add hasOneUse checks for folding (not (add X, -1)) to (neg X) (#126667 ) To get more better codegen for AArch with bic, x86 with andn and riscv with andn.	2025-02-12 12:24:29 +08:00
Matt Arsenault	c268a3f093	DAG: Fix extract of load combine with mismatched vector element type Fix the case where the vector element type of the loaded extractelement input does not match the result type of the extract. This fixes a regression reported after c55a7659b38946350315ac4a18d9805deb1f0a54	2025-02-06 22:56:56 +07:00
Matt Arsenault	c55a7659b3	DAG: Move scalarizeExtractedVectorLoad to TargetLowering (#122670 ) SimplifyDemandedVectorElts should be able to use this on loads	2025-02-04 17:37:12 +07:00
Craig Topper	788bbd2ef6	[DAGCombiner] Improve chain handling in fold (fshl ld1, ld0, c) -> (ld0[ofs]) combine. (#124871 ) Happened to notice some odd things related to chains in this code. The code calls hasOneUse on LoadSDNode* which will check users of the data and the chain. I think this was trying to check that the data had one use so one of the loads would definitely be removed by the transform. Load chains don't always have users so our testing may not have noticed that the chains being used would block the transform. The code makes all users of ld1's chain use the new load's chain, but we don't know that ld1 becomes dead. This can cause incorrect dependencies if ld1's chain is used and it isn't deleted. I think the better thing to do is use makeEquivalentMemoryOrdering to make all users of ld0 and ld1 depend on the new load and the original loads. If the olds loads become dead, their chain will be cleaned up later. I'm having trouble getting a test for any ordering issue with the current code. areNonVolatileConsecutiveLoads requires the two loads to have the same input chain. Given that, I don't know how to use one of the load chain results without also using the other. If they are both used we don't do the transform because SDNode::hasOneUse will return false for both.	2025-02-03 11:48:41 -08:00
Matt Arsenault	97a1f494a6	DAG: Avoid breaking legal vector_shuffle with multiple uses (#123712 ) Previously this combine would undo AMDGPU's new custom legalization of wide vector shuffles into 2 element pieces. The comment also states that this combine is only done before legalization, but the case with a build_vector source was unconditional. We probably don't want to do this if the multiple uses are full scalarization of the vector, but this seems to work well enough. Scalarizing extracts should have folded out pre-legalize.	2025-01-30 10:55:21 +07:00
Benjamin Maxwell	778138114e	[SDAG] Use BatchAAResults for querying alias analysis (AA) results (#123934 ) Once we get to SelectionDAG the IR should not be changing anymore, so we can use BatchAAResults rather than AAResults to cache AA queries. This should be a NFC change for targets that enable AA during codegen (such as AArch64), but also give a nice compile-time improvement in some cases. See: https://github.com/llvm/llvm-project/pull/123787#issuecomment-2606797041 Note: This follows Nikita's suggestion on #123787.	2025-01-23 09:16:09 +00:00
Matt Arsenault	5e79ae60a6	DAG: Fix vector_shuffle -> splat fold defining undef lanes (#123596 ) For shuffle vector splats with undef lanes in the mask, this was introducing real values. Filter out build_vector results based on the undef elements in the mask. This avoids AMDGPU test regressions in a future change. test/CodeGen/X86/urem-seteq-illegal-types.ll looks worse but I didn't investigate.	2025-01-21 23:55:50 +07:00
David Sherwood	50bfa85d79	[DAGCombiner] Fix scalarizeExtractedBinOp for some SETCC cases (#123071 ) PR https://github.com/llvm/llvm-project/pull/118823 added a DAG combine for extracting elements of a vector returned from SETCC, however it doesn't correctly deal with the case where the vector element type is not i1. In this case we have to take account of the boolean contents, which are represented differently between vectors and scalars. The code now explicitly performs an inreg sign extend in order to get the same result. Fixes https://github.com/llvm/llvm-project/issues/121372	2025-01-21 10:31:56 +00:00
Simon Pilgrim	bacfdcd7e0	[DAG] Add SDPatternMatch::m_BitCast matcher (#123327 ) Simplifies a future patch	2025-01-17 12:22:07 +00:00
Matt Arsenault	7475f0a345	DAG: Avoid forming shufflevector from a single extract_vector_elt (#122672 ) This avoids regressions in a future AMDGPU commit. Previously we would have a build_vector (extract_vector_elt x), undef with free access to the elements bloated into a shuffle of one element + undef, which has much worse combine support than the extract. Alternatively could check aggressivelyPreferBuildVectorSources, but I'm not sure it's really different than isExtractVecEltCheap.	2025-01-17 08:44:43 +07:00
Matt Arsenault	4431106630	DAG: Fix vector bin op scalarize defining a partially undef vector (#122459 ) This avoids some of the pending regressions after AMDGPU implements isExtractVecEltCheap. In a case like shl <value, undef>, splat k, because the second operand was fully defined, we would fall through and use the splat value for the first operand, losing the undef high bits. This would result in an additional instruction to handle the high bits. Add some reduced testcases for different opcodes for one of the regressions.	2025-01-17 08:34:03 +07:00
Simon Pilgrim	bd768246da	[DAG] replaceShuffleOfInsert - convert INSERT_VECTOR_ELT matching to use SDPatternMatch helpers. NFC.	2025-01-15 10:09:21 +00:00
Matt Arsenault	f4598194b5	DAG: Fold bitcast of scalar_to_vector to anyext (#122660 ) scalar_to_vector is difficult to make appear and test, but I found one case where this makes an observable difference. It fires more often than this in the test suite, but most of them have no net result in the final code. This helps reduce regressions in a future commit.	2025-01-13 19:38:58 +07:00
Ryan Mansfield	67efbd0bf1	[LLVM] Fix various cl::desc typos and whitespace issues (NFC) (#121955 )	2025-01-08 11:07:23 +01:00
Simon Pilgrim	1332db36ee	[DAG] TransformFPLoadStorePair - early out if we're not loading a simple type Its never going to transform into a legal integer type, so just bail - noticed while triaging the assertion reported in #121784	2025-01-07 13:37:23 +00:00
Matt Arsenault	d34f7ead88	DAG: Fix assuming f16 is the only 16-bit fp type in concat vector combine (#121637 ) This would see if there are mixed integer and FP types and pick an equivalently sized FP type to use as the vector element type, and only cast if there were mixed integers. We need to insert a cast if the types are mixed, which may include different FP types. Fixes #121601	2025-01-06 10:38:54 +07:00
Min-Yih Hsu	2291d0aba9	[DAGCombiner] Turn `(neg (max x, (neg x)))` into `(min x, (neg x))` (#120666 ) This pattern was originally spotted in 429.mcf by @topperc. We already have a DAGCombiner pattern to turn `(neg (abs x))` into `(min x, (neg x))`. But in some cases `(neg (max x, (neg x)))` is formed by an expanded `abs` followed by a `neg` that is generated only after the `abs` expansion. This patch adds a separate pattern to match cases like this, as well as its inverse pattern: `(neg (min X, (neg X))) --> (max X, (neg X))`. This pattern is applicable to both signed and unsigned min/max.	2025-01-02 16:28:55 -08:00
Simon Pilgrim	b3a7ab6f1f	[DAG] Don't allow implicit truncation in extract_element(bitcast(scalar_to_vector(X))) -> trunc(srl(X,C)) fold Limits #117900 to only fold when scalar_to_vector doesn't perform implicit truncation, as the scaled shift calculation doesn't currently account for this - this can be addressed in a future update. Fixes #121306	2024-12-30 16:08:35 +00:00
Craig Topper	f139bde8d8	[SelectionDAG] Move SDNode::use_iterator::getOperandNo to SDUse. (#120536 ) This allows us to write more range based for loops because we no longer need the iterator. It also matches IR's Use class.	2024-12-19 09:07:42 -08:00
Craig Topper	e6b2495545	[SelectionDAG] Split SDNode::use_iterator into user_iterator and use_iterator. (#120531 ) SDNode::use_iterator now returns an SDUse& when dereferenced. SDNode::user_iterator returns SDNode*. SDNode::use_begin/use_end/uses work on use_iterator. SDNode::user_begin/user_end/users work on user_iterator. We can now write range based for loops using SDUse& and SDNode::uses(). I've converted many of these in this patch. I didn't update loops that have additional variables updated in their for statement. Some loops use SDNode::use_iterator::getOperandNo() which also prevents using range based for loops. I plan to move this into SDUse in a follow up patch.	2024-12-19 08:35:32 -08:00
Craig Topper	bd261ecc5a	[SelectionDAG] Add SDNode::user_begin() and use it in some places (#120509 ) Most of these are just places that want the first user and aren't iterating over the whole list. While there I changed some use_size() == 1 to hasOneUse() which is more efficient. This is part of an effort to rename use_iterator to user_iterator and provide a use_iterator that dereferences to SDUse&. This patch helps reduce the diff on later patches.	2024-12-18 22:13:04 -08:00
Craig Topper	104ad9258a	[SelectionDAG] Rename SDNode::uses() to users(). (#120499 ) This function is most often used in range based loops or algorithms where the iterator is implicitly dereferenced. The dereference returns an SDNode * of the user rather than SDUse * so users() is a better name. I've long beeen annoyed that we can't write a range based loop over SDUse when we need getOperandNo. I plan to rename use_iterator to user_iterator and add a use_iterator that returns SDUse& on dereference. This will make it more like IR.	2024-12-18 20:09:33 -08:00
Benjamin Maxwell	a7dafea384	[SDAG] Allow folding stack slots into sincos/frexp in more cases (#118117 ) This adds a new helper `canFoldStoreIntoLibCallOutputPointers()` to check that it is safe to fold a store into a node that will expand to a library call that takes output pointers. This requires checking for two (independent) properties: 1. The store is not within a CALLSEQ_START..CALLSEQ_END pair * If it is, the expansion would lead to nested call sequences (which is invalid) 2. The node does not appear as a predecessor to the store * If it does, attempting to merge the store into the call would result in a cycle in the DAG These two properties are checked as part of the same traversal in `canFoldStoreIntoLibCallOutputPointers()`	2024-12-17 10:54:17 +00:00
Simon Pilgrim	0954c67d7a	[DAG] visitFREEZE - only fold integer types to an all ones constant ISD::isBuildVectorAllOnes can peek through bitcasts, so this can match against FP NAN (ish) data (e.g. double (bitcast i64 -1)) under certain circumstances - bail if the type isn't an integer and let bitcast folding handle it first. Fixes #120093	2024-12-16 16:46:38 +00:00
Björn Pettersson	3ad2399148	[DAGCombiner] Refactor and improve ReduceLoadOpStoreWidth (#119564 ) This patch make a couple of improvements to ReduceLoadOpStoreWidth. When determining the minimum size of "NewBW" we now take byte boundaries into account. If we for example touch bits 6-10 we shouldn't accept NewBW=8, because we would fail later when detecting that we can't access bits from two different bytes in memory using a single load. Instead we make sure to align LSB/MSB according to byte size boundaries up front before searching for a viable "NewBW". In the past we only tried to find a "ShAmt" that was a multiple of "NewBW", but now we use a sliding window technique to scan for a viable "ShAmt" that is a multiple of the byte size. This can help out finding more opportunities for optimization (specially if the original type isn't byte sized, and for big-endian targets when the original load/store is aligned on the most significant bit).	2024-12-16 12:15:11 +01:00
Bjorn Pettersson	22780f808a	[DAGCombiner] Fix to avoid writing outside original store in ReduceLoadOpStoreWidth (#119203 ) DAGCombiner::ReduceLoadOpStoreWidth could replace memory accesses with more narrow loads/store, although sometimes the new load/store would touch memory outside the original object. That seemed wrong and this patch is simply avoiding doing the DAG combine in such situations. Also simplifying the expression used to align ShAmt down to a multiple of NewBW. Subtracting (ShAmt % NewBW) should do the same thing as the old more complicated expression. Intention is to follow up with a patch that make more attempts, trying to align the memory accesses at other offsets, allowing to trigger the transform in more situations. The current strategy for deciding size (NewBW) and offset (ShAmt) for the narrowed operations are a bit ad-hoc, and not really considering big endian memory order in same way as little endian.	2024-12-11 15:07:16 +01:00
Bjorn Pettersson	bc1f3eb593	[DAGCombiner] Pre-commit test case for ReduceLoadOpStoreWidth. NFC Adding test cases related to narrowing of load-op-store sequences. ReduceLoadOpStoreWidth isn't careful enough, so it may end up creating load/store operations that access memory outside the region touched by the original load/store. Using ARM as a target for the test cases to show what happens for both little-endian and big-endian. This patch also adds a way to override the TLI.isNarrowingProfitable check in DAGCombiner::ReduceLoadOpStoreWidth by using the option -combiner-reduce-load-op-store-width-force-narrowing-profitable. Idea is that it should be simpler to for example add lit tests verifying that the code is correct for big-endian (which otherwise is difficult since there are no in-tree big-endian targets that is overriding TLI.isNarrowingProfitable). This is a pre-commit for https://github.com/llvm/llvm-project/pull/119203	2024-12-11 15:07:15 +01:00
LiqinWeng	3083acc215	[DAGCombine] Remove oneuse restrictions for RISCV in folding (shl (add_nsw x, c1)), c2) and folding (shl(sext(add x, c1)), c2) in some scenarios (#101294 ) This patch remove the restriction for folding (shl (add_nsw x, c1)), c2) and folding (shl(sext(add x, c1)), c2), and test case from dhrystone , see this link: riscv32: https://godbolt.org/z/o8GdMKrae riscv64: https://godbolt.org/z/Yh5bPz56z	2024-12-10 11:17:54 +08:00
David Sherwood	8630a7ba7c	Reapply "[DAGCombiner] Add support for scalarising extracts of a vector setcc (#117566 )" (#118823 ) [Reverts d57892a2a153ab71a796f07e39d939eae6910c21] For IR like this: %icmp = icmp ult <4 x i32> %a, splat (i32 5) %res = extractelement <4 x i1> %icmp, i32 1 where there is only one use of %icmp we can take a similar approach to what we already do for binary ops such add, sub, etc. and convert this into %ext = extractelement <4 x i32> %a, i32 1 %res = icmp ult i32 %ext, 5 For AArch64 targets at least the scalar boolean result will almost certainly need to be in a GPR anyway, since it will probably be used by branches for control flow. I've tried to reuse existing code in scalarizeExtractedBinop to also work for setcc. NOTE: The optimisations don't apply for tests such as extract_icmp_v4i32_splat_rhs in the file CodeGen/AArch64/extract-vector-cmp.ll because scalarizeExtractedBinOp only works if one of the input operands is a constant. --------- Co-authored-by: Paul Walker <paul.walker@arm.com>	2024-12-09 10:56:44 +00:00
Alex MacLean	6018820c48	[NVPTX] Fix lowering of i1 SETCC (#115035 ) Add DAG legalization support for expanding i1 SETCC nodes using appropriate logical operations to simulate integer comparisons. Use these expansions to handle i1 SETCC in NVPTX. fixes #58428 and #57405	2024-12-05 12:54:24 -08:00
Vitaly Buka	d57892a2a1	Revert "[DAGCombiner] Add support for scalarising extracts of a vector setcc" (#118693 ) Reverts llvm/llvm-project#117566 Breaks libc++ tests with HWASAN https://lab.llvm.org/buildbot/#/builders/55/builds/3959	2024-12-04 12:36:46 -08:00
Oliver Stannard	99b862efba	[DAGISel][ARM] Fix vector truncate combine for big-endian (#118101 ) This DAG combine was incorrect for big-endian targets, because it assumes that when a bitcast changes the lane width, the least-significant bits of the wider lanes are in the lower-numbered lanes of the smaller type, which is only true for little-endian.	2024-12-04 14:32:15 +00:00
David Sherwood	4675db5f39	[DAGCombiner] Add support for scalarising extracts of a vector setcc (#117566 ) For IR like this: %icmp = icmp ult <4 x i32> %a, splat (i32 5) %res = extractelement <4 x i1> %icmp, i32 1 where there is only one use of %icmp we can take a similar approach to what we already do for binary ops such add, sub, etc. and convert this into %ext = extractelement <4 x i32> %a, i32 1 %res = icmp ult i32 %ext, 5 For AArch64 targets at least the scalar boolean result will almost certainly need to be in a GPR anyway, since it will probably be used by branches for control flow. I've tried to reuse existing code in scalarizeExtractedBinop to also work for setcc. NOTE: The optimisations don't apply for tests such as extract_icmp_v4i32_splat_rhs in the file CodeGen/AArch64/extract-vector-cmp.ll because scalarizeExtractedBinOp only works if one of the input operands is a constant.	2024-12-04 10:26:51 +00:00
fengfeng	7907292daa	[DAG] Apply Disjoint flag. (#118045 ) or disjoint (or disjoint (x, c0), c1) --> or disjont x, or (c0, c1) Alive2: https://alive2.llvm.org/ce/z/3wPth5 --------- Signed-off-by: feng.feng <feng.feng@iluvatar.com>	2024-12-03 09:21:03 +08:00
Simon Pilgrim	31b7d4333a	[DAG] Extend extract_element(bitcast(scalar_to_vector(X))) -> trunc(srl(X,C)) (#117900 ) When extracting a smaller integer from a scalar_to_vector source, we were limited to only folding/truncating the lowest bits of the scalar source. This patch extends the fold to handle extraction of any other element, by right shifting the source before truncation. Fixes a regression from #117884	2024-11-29 17:24:38 +00:00
David Sherwood	9b76e7fc60	Revert "[DAGCombiner] Add support for scalarising extracts of a vector setcc (#116031 )" (#117556 ) This reverts commit 22ec44f509ff266b581dbb490d7b040473b7c31a.	2024-11-25 13:49:21 +00:00
David Sherwood	22ec44f509	[DAGCombiner] Add support for scalarising extracts of a vector setcc (#116031 ) For IR like this: %icmp = icmp ult <4 x i32> %a, splat (i32 5) %res = extractelement <4 x i1> %icmp, i32 1 where there is only one use of %icmp we can take a similar approach to what we already do for binary ops such add, sub, etc. and convert this into %ext = extractelement <4 x i32> %a, i32 1 %res = icmp ult i32 %ext, 5 For AArch64 targets at least the scalar boolean result will almost certainly need to be in a GPR anyway, since it will probably be used by branches for control flow. I've tried to reuse existing code in scalarizeExtractedBinop to also work for setcc. NOTE: The optimisations don't apply for tests such as extract_icmp_v4i32_splat_rhs in the file CodeGen/AArch64/extract-vector-cmp.ll because scalarizeExtractedBinOp only works if one of the input operands is a constant.	2024-11-25 09:25:01 +00:00
Jonathan Cohen	00d383ee9d	[DAGCombiner] Limit steps in shouldCombineToPostInc (#116030 ) Currently the function will walk the entire DAG to find other candidates to perform a post-inc store. This leads to very long compilation times on large functions. Added a MaxSteps limit to avoid this, which is also aligned to how hasPredecessorHelper is used elsewhere in the code.	2024-11-21 11:58:37 +02:00
David Sherwood	b9dd60228c	[DAGCombiner] Remove a hasOneUse check in visitAND (#115142 ) For some reason there was a hasOneUse check on the splat for the second operand and it's not obvious to me why. The check blocks optimisations for lowering of nodes like AVGFLOORU and AVGCEILU. In a follow-on patch I also plan to improve the generated code for AVGCEILU further by teaching computeKnownBits about zero-extending masked loads.	2024-11-08 08:20:31 +00:00
Yingwei Zheng	f74aed7938	[DAGCombiner] Add basic support for `trunc nsw/nuw` (#113808 ) This patch adds basic support for `trunc nsw/nuw` in SDAG. It will allow DAGCombiner to further eliminate in-reg `zext/sext` instructions.	2024-11-07 00:23:53 +08:00

1 2 3 4 5 ...

4027 Commits