llvm-project

Author	SHA1	Message	Date
Florian Hahn	94ed57dab6	[PhaseOrdering] Add test for #85551 . Add test for missed hoisting of checks from std::span https://github.com/llvm/llvm-project/issues/85551	2024-04-10 13:30:30 +01:00
Simon Pilgrim	212b2bbcd1	[VectorCombine][X86] foldShuffleOfCastops - fold shuffle(cast(x),cast(y)) -> cast(shuffle(x,y)) iff cost efficient (#87510 ) Based off the existing foldShuffleOfBinops fold Fixes #67803	2024-04-04 11:22:37 +01:00
Monad	56b3222b79	[InstCombine] Remove the canonicalization of `trunc` to `i1` (#84628 ) Remove the canonicalization of `trunc` to `i1` according to the suggestion of https://github.com/llvm/llvm-project/pull/83829#issuecomment-1986801166 `a84e66a92d/llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp (L737-L745)` Alive2: https://alive2.llvm.org/ce/z/cacYVA	2024-03-29 21:47:35 +08:00
Florian Hahn	06bb8c9f20	[VPlan] Explicitly handle scalar pointer inductions. (#83068 ) Add a new PtrAdd opcode to VPInstruction that corresponds to IRBuilder::CreatePtrAdd, which creates a GEP with source element type i8. This is then used to model scalarizing VPWidenPointerInductionRecipe by introducing scalar-steps to model the index increment followed by a PtrAdd. Note that PtrAdd needs to be able to generate code for only the first lane or for all lanes. This may warrant introducing a separate recipe for scalarizing that can be created without relying on the underlying IR. Depends on https://github.com/llvm/llvm-project/pull/80271 PR: https://github.com/llvm/llvm-project/pull/83068	2024-03-26 16:01:57 +01:00
Simon Pilgrim	ee5e027cc6	[X86] getShuffleCost - recognise concat_vector(X,Y) shuffle as InsertSubvector instead of PermuteTwoSrc We don't have a concat_vector shuffle kind and improveShuffleKindFromMask won't alter the base type to match it as InsertSubvector. But since this is how X86 will lower concat_vector anyhow, just recognise it explicitly. Another step for #67803	2024-03-21 09:29:39 +00:00
Simon Pilgrim	7812fcf3d7	[VectorCombine] foldBitcastShuf - add support for binary shuffles (REAPPLIED) Generalise fold to "bitcast (shuf V0, V1, MaskC) --> shuf (bitcast V0), (bitcast V1), MaskC'". Reapplied with a clang codegen test fix. Further prep work for #67803	2024-03-20 15:06:19 +00:00
Simon Pilgrim	ada24ae5e6	Revert 2ac85d8d200a9e1e0ced501c2d2f04404c400bd9 "[VectorCombine] foldBitcastShuf - add support for binary shuffles" Breaks some tests in other subprojects - will recommit with a fix later	2024-03-20 13:39:42 +00:00
Simon Pilgrim	2ac85d8d20	[VectorCombine] foldBitcastShuf - add support for binary shuffles Generalise fold to "bitcast (shuf V0, V1, MaskC) --> shuf (bitcast V0), (bitcast V1), MaskC'". Further prep work for #67803	2024-03-20 13:19:30 +00:00
Michele Scandale	09eb9f1136	[InstCombine] Fix for folding `select` into floating point binary operators. (#83200 ) Folding a `select` into a floating point binary operators can only be done if the result is preserved for both case. In particular, if the other operand of the `select` can be a NaN, then the transformation won't preserve the result value.	2024-03-19 09:47:07 -07:00
Philip Reames	0d38f21e4a	[SCEV] Extend type hint in analysis output to all backedge kinds This extends the work from 7755c26 to all of the different backend taken count kinds that we print for the scev analysis printer. As before, the goal is to cut down on confusion as i4 -1 is a very different (unsigned) value from i32 -1.	2024-03-06 13:08:05 -08:00
Philip Reames	e946b5a87b	[SCEV] Autogenerate more scev analysis check tests	2024-03-06 12:42:19 -08:00
Simon Pilgrim	08e036e734	[PhaseOrdering][X86] Add test coverage for #67803	2024-03-05 15:50:37 +00:00
Paul Kirth	2fef685363	[llvm][loop-rotate] Allow forcing loop-rotation (#82828 ) Many profitable optimizations cannot be performed at -Oz, due to unrotated loops. While this is worse for size (minimally), many of the optimizations significantly reduce code size, such as memcpy optimizations and other patterns found by loop idiom recognition. Related discussion can be found in issue #50308. This patch adds an experimental, backend-only flag to allow loop header duplication, regardless of the optimization level. Downstream consumers can experiment with this flag, and if it is profitable, we can adjust the compiler's defaults accordingly, and expose any useful frontend flags to opt into the new behavior.	2024-02-29 13:46:13 -08:00
Florian Hahn	dce77a3579	[IndVars] Preserve flags of narrow IV inc if replacing with wider inc. (#80446 ) We are replacing a narrow IV increment with a wider one. If the original (narrow) increment did not wrap, the wider one should not wrap either. Set the flags to be the union of both wide increment and original increment; this ensures we preserve flags SCEV could infer for the wider increment. Fixes https://github.com/llvm/llvm-project/issues/71517.	2024-02-10 18:11:17 +00:00
Florian Hahn	a0635edc59	[PhaseOrdering] Add tests showing missed simplifications. Add tests showing missed simplifications due to phase ordering.	2024-02-09 15:48:01 +00:00
Nilanjana Basu	c1c5b854ad	[LV] Remove loop trip count threshold for deciding whether to interleave a loop (#67725 ) A set of microbenchmarks (https://github.com/llvm/llvm-test-suite/pull/26) showed that loop interleaving can be beneficial for loops with low trip count as well. Loop interleaving count computation is updated accordingly in prior patches while this patch removes the loop trip count threshold for interleaving.	2024-02-05 17:23:58 -08:00
Nikita Popov	90ba33099c	[InstCombine] Canonicalize constant GEPs to i8 source element type (#68882 ) This patch canonicalizes getelementptr instructions with constant indices to use the `i8` source element type. This makes it easier for optimizations to recognize that two GEPs are identical, because they don't need to see past many different ways to express the same offset. This is a first step towards https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699. This is limited to constant GEPs only for now, as they have a clear canonical form, while we're not yet sure how exactly to deal with variable indices. The test llvm/test/Transforms/PhaseOrdering/switch_with_geps.ll gives two representative examples of the kind of optimization improvement we expect from this change. In the first test SimplifyCFG can now realize that all switch branches are actually the same. In the second test it can convert it into simple arithmetic. These are representative of common optimization failures we see in Rust. Fixes https://github.com/llvm/llvm-project/issues/69841.	2024-01-24 15:25:29 +01:00
Nikita Popov	cd7ea4ea65	[LAA] Drop alias scope metadata that is not valid across iterations (#79161 ) LAA currently adds memory locations with their original AATags to AST. However, scoped alias AATags may be valid only within one loop iteration, while LAA reasons across iterations. Fix this by determining which alias scopes are defined inside the loop, and drop AATags that reference these scopes. Fixes https://github.com/llvm/llvm-project/issues/79137.	2024-01-24 11:20:16 +01:00
Nikita Popov	543cf08636	[PhaseOrdering] Add additional test for #79161 (NFC)	2024-01-24 10:46:11 +01:00
Florian Hahn	f47c4067fd	[PhaseOrder] Add test where indvars dropping NSW prevents vectorization. End-to-end test for https://github.com/llvm/llvm-project/issues/71517, testing IndVars/LoopVectorize interaction	2024-01-23 11:37:04 +00:00
Nikita Popov	7c00a5be5c	[PhaseOrdering] Regenerate test checks (NFC)	2024-01-09 15:09:17 +01:00
Florian Hahn	18ec3304a9	[VPlan] Manage InBounds via VPRecipeWithIRFlags for VectorPtrRecipe. As suggested as follow-up in https://github.com/llvm/llvm-project/pull/72164, manage inbounds via VPRecipeWithIRFlags. Note that in some cases we can now preserve inbounds in a few more cases.	2024-01-07 13:58:05 +00:00
Yingwei Zheng	7c3bcc307a	[InstCombine] Fold `switch(zext/sext(X))` into `switch(X)` (#76988 ) This patch folds `switch(zext/sext(X))` into `switch(X)`. The original motivation of this patch is to optimize a pattern found in cvc5. For example: ``` %bf.load.i = load i16, ptr %d_kind.i, align 8 %bf.clear.i = and i16 %bf.load.i, 1023 %bf.cast.i = zext nneg i16 %bf.clear.i to i32 switch i32 %bf.cast.i, label %if.else [ i32 335, label %if.then i32 303, label %if.then ] if.then: ; preds = %entry, %entry %d_children.i.i = getelementptr inbounds %"class.cvc5::internal::expr::NodeValue", ptr %0, i64 0, i32 3 %cmp.i.i.i.i.i = icmp eq i16 %bf.clear.i, 1023 %cond.i.i.i.i.i = select i1 %cmp.i.i.i.i.i, i32 -1, i32 %bf.cast.i ``` `%cmp.i.i.i.i.i` always evaluates to false because `%bf.clear.i` can only be 335 or 303. Folding `switch i32 %bf.cast.i` to `switch i16 %bf.clear.i` will help `CVP` to handle this case. See also https://github.com/llvm/llvm-project/pull/76928#issuecomment-1877055722. Compile-time impact: http://llvm-compile-time-tracker.com/compare.php?from=7954c57124b495fbdc73674d71f2e366e4afe522&to=502b13ed34e561d995ae1f724cf06d20008bd86f&stat=instructions:u \|stage1-O3\|stage1-ReleaseThinLTO\|stage1-ReleaseLTO-g\|stage1-O0-g\|stage2-O3\|stage2-O0-g\|stage2-clang\| \|--\|--\|--\|--\|--\|--\|--\| \|+0.03%\|+0.06%\|+0.07%\|+0.00%\|-0.02%\|-0.03%\|+0.02%\|	2024-01-06 04:30:07 +08:00
Florian Hahn	f18536d642	[VPlan] Model address separately. (#72164 ) Move vector pointer generation to a separate VPVectorPointerRecipe. This untangles address computation from the memory recipes future and is also needed to enable explicit unrolling in VPlan. https://github.com/llvm/llvm-project/pull/72164	2024-01-01 19:51:15 +00:00
Yingwei Zheng	1228becf7d	[FuncAttrs] Deduce `noundef` attributes for return values (#76553 ) This patch deduces `noundef` attributes for return values. IIUC, a function returns `noundef` values iff all of its return values are guaranteed not to be `undef` or `poison`. Definition of `noundef` from LangRef: ``` noundef This attribute applies to parameters and return values. If the value representation contains any undefined or poison bits, the behavior is undefined. Note that this does not refer to padding introduced by the type’s storage representation. ``` Alive2: https://alive2.llvm.org/ce/z/g8Eis6 Compile-time impact: http://llvm-compile-time-tracker.com/compare.php?from=30dcc33c4ea3ab50397a7adbe85fe977d4a400bd&to=c5e8738d4bfbf1e97e3f455fded90b791f223d74&stat=instructions:u \|stage1-O3\|stage1-ReleaseThinLTO\|stage1-ReleaseLTO-g\|stage1-O0-g\|stage2-O3\|stage2-O0-g\|stage2-clang\| \|--\|--\|--\|--\|--\|--\|--\| \|+0.01%\|+0.01%\|-0.01%\|+0.01%\|+0.03%\|-0.04%\|+0.01%\| The motivation of this patch is to reduce the number of `freeze` insts and enable more optimizations.	2023-12-31 20:44:48 +08:00
Florian Hahn	fbcf8a8cbb	[ConstraintElim] Add (UGE, var, 0) to unsigned system for new vars. (#76262 ) The constraint system used for ConstraintElimination assumes all varibles to be signed. This can cause missed optimization in the unsigned system, due to missing the information that all variables are unsigned (non-negative). Variables can be marked as non-negative by adding Var >= 0 for all variables. This is done for arguments on ConstraintInfo construction and after adding new variables. This handles cases like the ones outlined in https://discourse.llvm.org/t/why-does-llvm-not-perform-range-analysis-on-integer-values/74341 The original example shared above is now handled without this change, but adding another variable means that instcombine won't be able to simplify examples like https://godbolt.org/z/hTnra7zdY Adding the extra variables comes with a slight compile-time increase https://llvm-compile-time-tracker.com/compare.php?from=7568b36a2bc1a1e496ec29246966ffdfc3a8b87f&to=641a47f0acce7755e340447386013a2e086f03d9&stat=instructions:u stage1-O3 stage1-ReleaseThinLTO stage1-ReleaseLTO-g stage1-O0-g +0.04% +0.07% +0.05% +0.02% stage2-O3 stage2-O0-g stage2-clang +0.05% +0.05% +0.05% https://github.com/llvm/llvm-project/pull/76262	2023-12-23 15:53:48 +01:00
Florian Hahn	e9a56ab316	[PhaseOrdering] Add test with removable chained conditions. Based on https://godbolt.org/z/hTnra7zdY, which is a slightly more complicated version of the example from https://discourse.llvm.org/t/why-does-llvm-not-perform-range-analysis-on-integer-values/74341	2023-12-22 19:44:20 +00:00
Nikita Popov	273a0c9c07	[PhaseOrdering] Add data layout to test (NFC) Needed for switch to lookup table optimization.	2023-12-20 11:49:34 +01:00
Nikita Popov	5ab5810054	[PhaseOrdering] Add additional test for switch with GEPs (NFC)	2023-12-20 11:41:46 +01:00
Nikita Popov	a5f3415533	[InstCombine] Replace non-demanded undef vector with poison If an operand (esp to shufflevector or insertelement) is not demanded, canonicalize it from undef to poison.	2023-12-18 16:12:37 +01:00
Nikita Popov	cf47af493b	[InstCombine] Generalize folds for inversion of icmp operands (#74317 ) We have a bunch of folds that basically perform X pred Y to ~Y pred ~X for various special cases where this saves an instruction. Generalize these folds to use isFreeToInvert(). We have to make sure that we consume an instruction in either of the inversions, otherwise we're just going to swap the icmp back and forth. Fixes https://github.com/llvm/llvm-project/issues/74302.	2023-12-08 11:25:41 +01:00
Nikita Popov	d77067d08a	[ValueTracking] Add dominating condition support in computeKnownBits() (#73662 ) This adds support for using dominating conditions in computeKnownBits() when called from InstCombine. The implementation uses a DomConditionCache, which stores which branches may provide information that is relevant for a given value. DomConditionCache is similar to AssumptionCache, but does not try to do any kind of automatic tracking. Relevant branches have to be explicitly registered and invalidated values explicitly removed. The necessary tracking is done inside InstCombine. The reason why this doesn't just do exactly the same thing as AssumptionCache is that a lot more transforms touch branches and branch conditions than assumptions. AssumptionCache is an immutable analysis and mostly gets away with this because only a handful of places have to register additional assumptions (mostly as a result of cloning). This is very much not the case for branches. This change regresses compile-time by about ~0.2%. It also improves stage2-O0-g builds by about ~0.2%, which indicates that this change results in additional optimizations inside clang itself. Fixes https://github.com/llvm/llvm-project/issues/74242.	2023-12-06 14:17:18 +01:00
Craig Topper	7ec4f6094e	[InstCombine] Infer disjoint flag on Or instructions. (#72912 ) The disjoint flag was recently added to IR in #72583 We already set it when we turn an add into an or. This patch sets it on Ors that weren't converted from an Add.	2023-12-02 14:11:12 -08:00
Craig Topper	03d4a9d94d	[InstCombine] Set disjoint flag when turning Add into Or. (#72702 ) The disjoint flag was recently added to IR in #72583	2023-11-27 12:54:11 -08:00
Florian Hahn	4ccdab3636	[ConstraintElim] Use isKnownNonNegative for condition transfer. (#72879 ) Use isKnownNonNegative for information transfer. This can improve results, in cases where ValueTracking can infer additional non-negative info, e.g. for phi nodes. This allows simplifying the check from https://github.com/llvm/llvm-project/issues/63126 by ConstraintElimination. It is also simplified by IndVarSimplify now; note the changes in llvm/test/Transforms/PhaseOrdering/loop-access-checks.ll, due to this now being simplified earlier.	2023-11-21 10:09:35 +00:00
Florian Hahn	10c0166909	[PhaseOrdering] Add tests where early sinking prevents if-conversion.	2023-11-16 20:31:21 +00:00
Yingwei Zheng	e8fe15ccf1	[InstCombine] Add exact flags for ext idiom `shr (shl X, Y), Y` (#72483 ) This patch adds exact flags for sext/zext idiom `shr (shl X, Y), Y`. Alive2: https://alive2.llvm.org/ce/z/xYFpfB We can generalize it to handle pattern `shr (shl X, Y), Z` with `Y u>= Z` (e.g., non-splat vectors). But I don't think it's worth the effort. This missed optimization is discovered with the help of https://github.com/AliveToolkit/alive2/pull/962.	2023-11-16 17:30:01 +08:00
Yingwei Zheng	dc6d077396	[CVP] Infer nneg on existing zext (#72052 ) This patch infers `nneg` flags for existing zext instructions in CVP. After https://github.com/llvm/llvm-project/pull/71534 and this patch, we can drop `zext -> zext nneg` transform in `RISCVCodeGenPrepare`: `40671bbdef/llvm/lib/Target/RISCV/RISCVCodeGenPrepare.cpp (L74-L83)` This is an alternative to #72049.	2023-11-13 22:41:37 +08:00
Léonard Oest O'Leary	ff36411b23	[InstCombine] Use zext's nneg flag for icmp folding (#70845 ) This PR fixes https://github.com/llvm/llvm-project/issues/55013 : the max intrinsics is not generated for this simple loop case : https://godbolt.org/z/hxz1xhMPh. This is caused by a ICMP not being folded into a select, thus not generating the max intrinsics. For the story : Since LLVM 14, SCCP pass got smarter by folding sext into zext for positive ranges : https://reviews.llvm.org/D81756. After this change, InstCombine was sometimes unable to fold ICMP correctly as both of the arguments pointed to mismatched zext/sext. To fix this, @rotateright implemented this fix : https://reviews.llvm.org/D124419 that tries to resolve the mismatch by knowing if the argument of a zext is positive (in which case, it is like a sext) by using ValueTracking, however ValueTracking is not smart enough to infer that the value is positive in some cases. Recently, @nikic implemented #67982 which keeps the information that a zext is non-negative. This PR simply uses this information to do the folding accordingly. TLDR : This PR uses the recent nneg tag on zext to fold the icmp accordingly in instcombine. This PR also contains test cases for sext/zext folding with InstCombine as well as a x86 regression tests for the max/min case.	2023-11-13 00:53:53 +08:00
dewen	3b82336188	Revert "[PM] Execute IndVarSimplifyPass precede RessociatePass" (#71617 ) Reverts llvm/llvm-project#71054	2023-11-08 09:22:55 +08:00
dewen	e4d27d7f32	[PM] Execute IndVarSimplifyPass precede RessociatePass (#71054 ) ReassociatePass may clear nsw/nuw flags of some instructions, which may have side effects on optimizations in IndVarSimplifyPass.	2023-11-08 09:21:17 +08:00
Johannes Doerfert	3de645efe3	[OpenMP][NFC] Split the reduction buffer size into two components Before we tracked the size of the teams reduction buffer in order to allocate it at runtime per kernel launch. This patch splits the number into two parts, the size of the reduction data (=all reduction variables) and the (maximal) length of the buffer. This will allow us to allocate less if we need less, e.g., if we have less teams than the maximal length. It also allows us to move code from clangs codegen into the runtime as we now know how large the reduction data is.	2023-11-06 11:50:41 -08:00
Florian Hahn	ab6bd9436a	[ConstraintElim] Add tests for additional SGT->UGT transfer. Test cases inspired by https://github.com/llvm/llvm-project/issues/63126.	2023-11-03 13:38:39 +00:00
Craig Topper	55c9f24344	[CVP] Infer nneg on zext when forming from non-negative sext. (#70715 ) Builds on #67982 which recently introduced the nneg flag on a zext instruction.	2023-10-30 13:48:27 -07:00
Philip Reames	3f2ed812f0	[InstCombine] Infer nneg on zext when forming from non-negative sext (#70706 ) Builds on #67982 which recently introduced the nneg flag on a zext instruction. InstCombine is one of our largest canonicalizers of zext from non-negative sext instructions, so set the flag there.	2023-10-30 12:09:43 -07:00
Amara Emerson	2228b35f93	Revert "Revert "[InstCombine] Add oneuse checks to shr + cmp constant folds."" This reverts commit d37b283cdd37feca5ea71456cf350005add268e7. There was a simple logic bug in the else path. Tests codegen is different with the fix.	2023-10-28 03:12:15 -07:00
Amara Emerson	d37b283cdd	Revert "[InstCombine] Add oneuse checks to shr + cmp constant folds." This reverts commit a66051c68a43af39f9fd962f71d58ae0efcf860d. This seems to have caused issue #70509 so reverting until I have time to investigate.	2023-10-27 14:27:58 -07:00
Mehdi Amini	f390a76b7e	Revert "Revert "[OpenMP][NFC] Add min/max threads/teams count into the KernelEnvironment (#70257 )"" This reverts commit ddbaa11e9f43a38d50d62a9b9b07c3653b6bf8ab. Reapply the original commit, the broken test was repaired in 5e51363f38d083ab326736c0d4d1b5f9fe0de080 in the meantime.	2023-10-26 17:30:01 -07:00
Mehdi Amini	ddbaa11e9f	Revert "[OpenMP][NFC] Add min/max threads/teams count into the KernelEnvironment (#70257 )" This reverts commit c2a1249a8257ed033a98e32e425539c6da6700ec. The MLIR bots are broken with an omp test failure.	2023-10-26 17:25:20 -07:00
Johannes Doerfert	c2a1249a82	[OpenMP][NFC] Add min/max threads/teams count into the KernelEnvironment (#70257 ) The runtime needs to know about the acceptable launch bounds, especially if the compiler (middle- or backend) assumed those bounds. While this patch does not yet inform the runtime, it stores the bounds in a place that can/will be accessed and is associated with the kernel.	2023-10-26 14:46:55 -07:00

1 2 3 4 5 ...

573 Commits