llvm-project

Author	SHA1	Message	Date
David Green	676c641718	[VectorCombine] Use getInstructionCost to cost Shuffle. (#122068 ) This allows it to produce a more accurate cost for the shuffle, using the more accurate calls to getShuffleCost in getInstructionCost. It helps fix some of the regressions from vector combine a little while ago, now that we have better subvector extract costs.	2025-01-08 20:48:40 +00:00
Simon Pilgrim	322ff42315	[PhaseOrdering][AArch64] block_scaling_decompr_8bit.ll - use -passes="default<O3>" to allow DOS to correctly evaluate the RUN command Necessary for running update_test_checks.py on windows	2025-01-08 15:07:21 +00:00
Simon Pilgrim	5a7dfb4659	[CostModel][X86] Attempt to match v4f32 shuffles that map to MOVSS/INSERTPS instruction improveShuffleKindFromMask matches this as a SK_InsertSubvector of a v1f32 (which legalises to f32) into a v4f32 base vector, making it easy to recognise. MOVSS is limited to index0.	2025-01-07 11:31:44 +00:00
Simon Pilgrim	db88071a8b	[CostModel][X86] Attempt to match cheap v4f32 shuffles that map to SHUFPS instruction (#121778 ) Avoid always assuming the worst for v4f32 2 input shuffles, and match the SHUFPS pattern where possible - each pair of output elements must come from the same source register.	2025-01-06 17:54:36 +00:00
Simon Pilgrim	7cdbde70fa	[CostModel][X86] getShuffleCost - use processShuffleMasks for all shuffle kinds to legal types (#120599 ) (#121760 ) Now that processShuffleMasks can correctly handle 2 src shuffles, we can completely remove the shuffle kind limits and correctly recognize the number of active subvectors per legalized shuffle - improveShuffleKindFromMask will determine the shuffle kind for each split subvector.	2025-01-06 13:32:55 +00:00
Simon Pilgrim	67652a3d9f	[PhaseOrdering][X86] Add horizontal-sub test coverage for #34072 Matches the existing horizontal-add tests, with the additional non-commutable constraint	2025-01-06 12:15:21 +00:00
Simon Pilgrim	054e7c5971	[VectorCombine] foldInsExtVectorToShuffle - ignore shuffle costs for 'identity' insertion masks <u,1,u,u> 'inplace' single src shuffles can be treated as free identity shuffles - ignore any shuffle cost (similar to what we already do in other folds like foldShuffleOfShuffles) - eventually getShuffleCost should just return TCC_Free in these cases but in a lot of the targets' shuffle cost logic this currently ends up treated as a generic SK_PermuteSingleSrc. We still want to generate the shuffle as it will help further shuffle folds with the additional PoisonMaskElem 'undemanded' elements.	2025-01-05 13:02:31 +00:00
Simon Pilgrim	e3ec5a7286	[VectorCombine] foldShuffleOfBinops - fold shuffle(binop(shuffle(x),shuffle(z)),binop(shuffle(y),shuffle(w)) -> binop(shuffle(x,z),shuffle(y,w)) (#120984 ) Some patterns (in particular horizontal style patterns) can end up with shuffles straddling both sides of a binop/cmp. Where individually the folds aren't worth it, by merging the (oneuse) shuffles we can notably reduce the net instruction count and cost. One of the final steps towards finally addressing #34072	2025-01-03 10:29:07 +00:00
David Green	3ddc9f06ae	[AArch64] Additional shuffle subvector-extract cost tests. NFC A Phase Ordering test for intrinsic shuffles is also added, showing a recent regression from vector combining.	2025-01-02 10:13:51 +00:00
Simon Pilgrim	d1e5e6735a	[PhaseOrdering] Update test RUN lines to use `-passes="default<O3>"` to allow evaluation by DOS batch scripts. NFC. `-passes='default<O3>'` isn't correctly parsed on DOS, so when update_test_checks.py runs a system call on the opt RUN line, it fails to evaluate properly - use `-passes="default<O3>"` instead.	2024-12-31 15:25:12 +00:00
Muhammad Omair Javaid	332d2647ff	Revert "[LV]: Teach LV to recursively (de)interleave. (#89018 )" This reverts commit ccfe0de0e1e37ed369c9bf89dd0188ba0afb2e9a. This breaks LLVM build on AArch64 SVE Linux buildbots https://lab.llvm.org/buildbot/#/builders/143/builds/4462 https://lab.llvm.org/buildbot/#/builders/17/builds/4902 https://lab.llvm.org/buildbot/#/builders/4/builds/4399 https://lab.llvm.org/buildbot/#/builders/41/builds/4299	2024-12-31 03:12:24 +05:00
Florian Hahn	7f3428d3ed	[VPlan] Compute induction end values in VPlan. (#112145 ) Use createDerivedIV to compute IV end values directly in VPlan, instead of creating them up-front. This allows updating IV users outside the loop as follow-up. Depends on https://github.com/llvm/llvm-project/pull/110004 and https://github.com/llvm/llvm-project/pull/109975. PR: https://github.com/llvm/llvm-project/pull/112145	2024-12-29 19:05:08 +00:00
Hassnaa Hamdi	ccfe0de0e1	[LV]: Teach LV to recursively (de)interleave. (#89018 ) Currently available intrinsics are only ld2/st2, which don't support interleaving factor > 2. This patch teaches the LV to use ld2/st2 recursively to support high interleaving factors.	2024-12-27 12:42:07 +00:00
Simon Pilgrim	e3f8c229f5	[VectorCombine] foldInsExtVectorToShuffle - inserting into a poison base vector can be modelled as a single src shuffle We already canonicalized an undef base vector to the RHS to improve further folding, this extends this to improve the shuffle cost estimate of the single src shuffle	2024-12-23 15:49:17 +00:00
Simon Pilgrim	29c89d7265	[VectorCombine] foldShuffleOfShuffles - fold "shuffle (shuffle x, y, m1), (shuffle y, x, m2)" -> "shuffle x, y, m3" (#120959 ) foldShuffleOfShuffles currently only folds unary shuffles to ensure we don't end up with a merged shuffle with more than 2 sources, but this prevented cases where both shuffles were sharing sources. This patch generalizes the merge process to find up to 2 sources as it merges with the inner shuffles, it also moves the undef/poison handling stages into the merge loop as well. Fixes #120764	2024-12-23 14:56:15 +00:00
Simon Pilgrim	611401c115	[CostModel][X86] getShuffleCost - use processShuffleMasks to split SK_PermuteTwoSrc shuffles to legal types (#120599 ) processShuffleMasks can now correctly handle 2 src shuffles, so we can use the existing SK_PermuteSingleSrc splitting cost logic to handle SK_PermuteTwoSrc as well and correctly recognise the number of active subvectors per legalised shuffle.	2024-12-20 10:39:45 +00:00
Simon Pilgrim	091448e3c1	Revert "[CostModel][X86] getShuffleCost - use processShuffleMasks to split SK_PermuteTwoSrc shuffles to legal types" (#120707 ) Reverts llvm/llvm-project#120599 - some recent tests are currently failing	2024-12-20 10:06:03 +00:00
Simon Pilgrim	81e63f9e0c	[CostModel][X86] getShuffleCost - use processShuffleMasks to split SK_PermuteTwoSrc shuffles to legal types (#120599 ) processShuffleMasks can now correctly handle 2 src shuffles, so we can use the existing SK_PermuteSingleSrc splitting cost logic to handle SK_PermuteTwoSrc as well and correctly recognise the number of active subvectors per legalised shuffle.	2024-12-20 09:55:11 +00:00
Simon Pilgrim	434819c35f	[PhaseOrdering][X86] Add test coverage for #34072 Add tests for horizontal add patterns with missing/undemanded elements - which typically prevents folding to the (add (shuffle a, b),(shuffle a, b)) optimal pattern	2024-12-19 17:32:18 +00:00
Vitaly Buka	e6a63513a2	[PhaseOrdering] Update test for #120460	2024-12-18 10:25:30 -08:00
Alexander Kornienko	23a239267e	Revert "[InstCombine] Infer nuw for gep inbounds from base of object" (#120460 ) Reverts llvm/llvm-project#119225 due to the lack of sanitizer support, large potential of breaking code containing latent UB, non-trivial localization and investigation, and what seems to be a bad interaction with msan (a test is in the works). Related discussions: https://github.com/llvm/llvm-project/pull/119225#issuecomment-2551904822 https://github.com/llvm/llvm-project/pull/118472#issuecomment-2549986255	2024-12-18 19:06:34 +01:00
Florian Hahn	4ad0fdd163	[VPlan] Remove reverse() of predecessors from VPInstruction::generate. This was originally done to reduce the diff for the change. Remove it and update the remaining tests. NFC modulo reordering of incoming values. Clean up after https://github.com/llvm/llvm-project/pull/114292.	2024-12-17 20:44:32 +00:00
Nikita Popov	1157187496	[VPlan] Propagate all GEP flags (#119899 ) Store GEPNoWrapFlags instead of only InBounds and propagate them.	2024-12-17 13:48:50 +01:00
Florian Hahn	6c8f41d336	[VPlan] Hook IR blocks into VPlan during skeleton creation (NFC) (#114292 ) As a first step to move towards modeling the full skeleton in VPlan, start by wrapping IR blocks created during legacy skeleton creation in VPIRBasicBlocks and hook them into the VPlan. This means the skeleton CFG is represented in VPlan, just before execute. This allows moving parts of skeleton creation into recipes in the VPBBs gradually. Note that this allows retiring some manual DT updates, as this will be handled automatically during VPlan execution. PR: https://github.com/llvm/llvm-project/pull/114292	2024-12-12 15:58:16 +00:00
Simon Pilgrim	625ec7ec89	[VectorCombine] Move concat-boolmasks.ll tests to be VectorCombine only Suggested on #119559	2024-12-12 11:02:01 +00:00
Vitaly Buka	89b7aea573	Revert "[VectorCombine] Fold "(or (zext (bitcast X)), (shl (zext (bitcast Y)), C))" -> "(bitcast (concat X, Y))" MOVMSK bool mask style patterns" (#119594 ) Reverts llvm/llvm-project#119559 Introduce use after free, see llvm/llvm-project#119559	2024-12-11 10:50:26 -08:00
Simon Pilgrim	08f904011f	[VectorCombine] Fold "(or (zext (bitcast X)), (shl (zext (bitcast Y)), C))" -> "(bitcast (concat X, Y))" MOVMSK bool mask style patterns (#119559 ) Mask/Bool vectors are often bitcast to/from scalar integers, in particular when concatenating mask results, often this is due to the difficulties of working with vector of bools on C/C++. On x86 this typically involves the MOVMSK/KMOV instructions. To concatenate bool masks, these are typically cast to scalars, which are then zero-extended, shifted and OR'd together. This patch attempts to match these scalar concatenation patterns and convert them to vector shuffles instead. This in turn often assists with further vector combines, depending on the cost model. Fixes #111431	2024-12-11 16:40:56 +00:00
David Green	2f18b5ef03	[AArch64] Add fpext and fpround costs (#119292 ) This adds some basic costs for fpext and fpround, many of which were already handled by the generic costing routines but this does make some adjustments for larger vector types that can use fcvtn+fcvtn2, as opposed to fcvtn+fcvtn+concat. These should now more closely match the codegen from https://godbolt.org/z/r3P9Mf8ez, for example.	2024-12-11 06:26:41 +00:00
Simon Pilgrim	c5a21c1158	[PhaseOrdering][X86] Add test coverage based off #111431 Add tests for the concatenation of boolean vectors bitcast to integers - similar to the MOVMSK pattern.	2024-12-10 17:31:08 +00:00
Simon Pilgrim	05b907f66b	[VectorCombine] foldShuffleOfShuffles - allow fold with only single shuffle operand. (#119354 ) foldShuffleOfShuffles already handles "shuffle (shuffle x, undef), (shuffle y, undef)" patterns, this patch relaxes the requirement so it can handle cases where only a single operand is a shuffle (and the other can be any other value and will be kept in place). Fixes #86068	2024-12-10 13:10:25 +00:00
Nikita Popov	e21ab4d16b	[InstCombine] Infer nuw for gep inbounds from base of object (#119225 ) When we have a gep inbounds from the base of an object (e.g. alloca or global), we know that the index cannot be negative, as this would go out of bounds. As such, we can infer nuw as well. The implementation is a bit stricter than necessary, we could also accept one unknown index followed by known-non-negative indices. Proof: https://alive2.llvm.org/ce/z/Hp7-6w (Note that alive2 currently incorrectly doesn't require the inbounds for the alloca case, see https://github.com/AliveToolkit/alive2/issues/1138).	2024-12-10 10:00:50 +01:00
Nikita Popov	10f315dc9c	[ConstantFolding] Infer getelementptr nuw flag (#119214 ) Infer nuw from nusw and nneg. This is the constant expression variant of https://github.com/llvm/llvm-project/pull/111144. Proof: https://alive2.llvm.org/ce/z/ihztLy	2024-12-09 16:44:05 +01:00
Nikita Popov	f7685af4a5	[InstCombine] Move gep of phi fold into separate function This makes sure that an early return during this fold doesn't end up skipping later gep folds.	2024-12-05 15:20:56 +01:00
Nikita Popov	462cb3cd6c	[InstCombine] Infer nusw + nneg -> nuw for getelementptr (#111144 ) If the gep is nusw (usually via inbounds) and the offset is non-negative, we can infer nuw. Proof: https://alive2.llvm.org/ce/z/ihztLy	2024-12-05 14:36:40 +01:00
Florian Hahn	f5bc6b47e8	[PhaseOrdering] Remove -enable-matrix flag from sub-xor.ll test. The test does not use matrix intrinsics, so does not need enable-matrix.	2024-12-02 11:43:43 +00:00
Simon Pilgrim	94df95de6b	[TTI][X86] getShuffleCosts - for SK_PermuteTwoSrc, if the masks are known to be "inlane" no need to scale the costs by worst-case legalization (#117999 ) SK_PermuteTwoSrc legalization has to assume any of the legalised source registers could be referenced in split shuffles, but if we already know that each 128-bit lane only references elements from the same lane of the source operands, then this scaling won't occur. Hopefully this can help with #113356 without us having to get full processShuffleMasks canonicalization finished first.	2024-12-01 12:01:47 +00:00
Marina Taylor	8fb748b4a7	[Inliner] Don't count a call penalty for foldable __memcpy_chk and similar (#117876 ) When the size is an appropriate constant, __memcpy_chk will turn into a memcpy that gets folded away by InstCombine. Therefore this patch avoids counting these as calls for purposes of inlining costs. This is only really relevant on platforms whose headers redirect memcpy to __memcpy_chk (such as Darwin). On platforms that use intrinsics, memcpy and similar functions are already exempt from call penalties.	2024-11-29 18:28:39 +00:00
Haopeng Liu	4d6e69143d	Add the initializes attribute inference (#117104 ) reland https://github.com/llvm/llvm-project/pull/97373 after fixing clang tests. Confirmed with "ninja check-llvm" and "ninja check-clang"	2024-11-20 19:15:23 -08:00
Florian Hahn	0bb1b68330	[Local] Only intersect tbaa metadata if instr moves. (#116682 ) Preserve tbaa metadata on the replacement instruction, if it does not move. In that case, the program would be UB, if the aliasing property encoded in the metadata does not hold. This makes use of the clarification re tbaa metadata implying UB if the property does not hold: https://github.com/llvm/llvm-project/pull/116220 Same as https://github.com/llvm/llvm-project/pull/115868, but for !tbaa PR: https://github.com/llvm/llvm-project/pull/116682	2024-11-20 19:31:16 +00:00
Florian Hahn	076513646c	[Local] Only intersect llvm.access.group metadata if instr moves. (#115868 ) Preserve llvm.access.group metadata on the replacement instruction, if it does not move. In that case, the program would be UB, if the parallel property encoded in the metadata does not hold. This matches the LangRef recently updated in #116220 PR https://github.com/llvm/llvm-project/pull/115868	2024-11-19 22:01:16 +00:00
Mikhail Goncharov	f77126c549	Revert "[FunctionAttrs] Add the "initializes" attribute inference (#97373 )" This reverts commit 661c593850715881d2805a59e90e6d87d8b9fbb8. Multiple buildbot failures, e.g. https://lab.llvm.org/buildbot/#/builders/108/builds/6096	2024-11-19 10:29:36 +01:00
Haopeng Liu	661c593850	[FunctionAttrs] Add the "initializes" attribute inference (#97373 ) Add the "initializes" attribute inference. This change is expected to have ~0.09% compile time regression, which seems acceptable for interprocedural DSE. https://llvm-compile-time-tracker.com/compare.php?from=9f10252c4ad7cffbbcf692fa9c953698f82ac4f5&to=56345c1cee4375eb5c28b8e7abf4803d20216b3b&stat=instructions%3Au	2024-11-18 21:36:05 -08:00
Julian Nagele	a8538b9138	[LV] Vectorize Epilogues for loops with small VF but high IC (#108190 ) - Consider MainLoopVF * IC when determining whether Epilogue Vectorization is profitable - Allow the same VF for the Epilogue as for the main loop - Use an upper bound for the trip count of the Epilogue when choosing the Epilogue VF PR: https://github.com/llvm/llvm-project/pull/108190 --------- Co-authored-by: Florian Hahn <flo@fhahn.com>	2024-11-17 19:35:32 +00:00
Michael Maitland	6b9952759f	[SimplifyCFG] Simplify switch instruction that has duplicate arms (#114262 ) I noticed that the two C functions emitted different IR: ``` int switch_duplicate_arms(int switch_val, int v, int w) { switch (switch_val) { default: break; case 0: w = v; break; case 1: w = v; break; } return w; } int if_duplicate_arms(int switch_val, int v, int w) { if (switch_val == 0) w = v; else if (switch_val == 1) w = v; return v0; } ``` We generate IR that looks like this: ``` define i32 @switch_duplicate_arms(i32 %0, i32 %1, i32 %2, i32 %3) { switch i32 %1, label %7 [ i32 0, label %5 i32 1, label %6 ] 5: br label %7 6: br label %7 7: %8 = phi i32 [ %3, %4 ], [ %2, %6 ], [ %2, %5 ] ret i32 %8 } define i32 @if_duplicate_arms(i32 %0, i32 %1, i32 %2, i32 %3) { %5 = icmp ult i32 %1, 2 %6 = select i1 %5, i32 %2, i32 %3 ret i32 %6 } ``` For `switch_duplicate_arms`, taking case 0 and 1 are the same since %5 and %6 branch to the same location and the incoming values for %8 are the same from those blocks. We could remove one on the duplicate switch targets and update the switch with the single target. On RISC-V, prior to this patch, we generate the following code: ``` switch_duplicate_arms: li a4, 1 beq a1, a4, .LBB0_2 mv a0, a3 bnez a1, .LBB0_3 .LBB0_2: mv a0, a2 .LBB0_3: ret if_duplicate_arms: li a4, 2 mv a0, a2 bltu a1, a4, .LBB1_2 mv a0, a3 .LBB1_2: ret ``` After this patch, the O3 code is optimized to the icmp + select pair, which gives us the same code gen as `if_duplicate_arms`, as desired. This results is one less branch instruction in the final assembly. This may help with both code size and further switch simplification. I found that this patch causes no significant impact to spec2006/int/ref and spec2017/intrate/ref. --------- Co-authored-by: Min Hsu <min@myhsu.dev>	2024-11-15 15:38:34 +01:00
Julian Nagele	7c8e05aa45	[SCEV] Collect and merge loop guards through PHI nodes with multiple incoming values (#113915 ) This patch aims to strengthen collection of loop guards by processing PHI nodes with multiple incoming values as follows: collect guards for all incoming values/blocks and try to merge them into a single one for the PHI node. The goal is to determine tighter bounds on the trip counts of scalar tail loops after vectorization, helping to avoid unnecessary transforms. In particular we'd like to avoid vectorizing scalar tails of hand-vectorized loops, for example in [Transforms/PhaseOrdering/X86/pr38280.ll](`231e03ba7e/llvm/test/Transforms/PhaseOrdering/X86/pr38280.ll`), discovered via https://github.com/llvm/llvm-project/pull/108190 Compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=a55248789ed3f653740e0723d016203b9d585f26&to=500e4c46e79f60b93b11a752698c520e345948e3&stat=instructions:u PR: https://github.com/llvm/llvm-project/pull/113915	2024-11-15 10:03:08 +00:00
Simon Pilgrim	1878b94568	[VectorCombine] isExtractExtractCheap - specify the extract/insert shuffle mask to improve shuffle costs (#114780 ) This shuffle mask is so focused, the cost model is very likely to be able to determine a specific (lower) cost	2024-11-13 12:31:39 +00:00
Florian Hahn	1aff96b3df	[InstCombine] Add extra tests for preserving load metadata. Test cases for https://github.com/llvm/llvm-project/issues/115595.	2024-11-09 12:24:01 +00:00
Hari Limaye	fbd89bcc66	Reland "[LTO] Run Argument Promotion before IPSCCP" (#111853 ) Run ArgumentPromotion before IPSCCP in the LTO pipeline, to expose more constants to be propagated. We also run PostOrderFunctionAttrs to improve the information available to ArgumentPromotion's alias analysis, and SROA to clean up allocas. Relands #111163.	2024-11-06 13:54:48 +00:00
Paul Walker	38fffa630e	[LLVM][IR] Use splat syntax when printing Constant[Data]Vector. (#112548 )	2024-11-06 11:53:33 +00:00
Simon Pilgrim	ac1869aa70	[CostModel][X86] Add initial costs for non-lane-crossing one/two input shuffles (#114680 ) Most of the x86 shuffle instructions operate within each 128-bit subvector lane, but our shuffle costs struggle to handle this and have to fallback to worst case shuffles that reference elements from any lane. This patch detects shuffle masks that we know are "inlane" and enable us to assume a cheaper shuffle cost.	2024-11-04 10:19:02 +00:00

1 2 3 4 5 ...

701 Commits