llvm-project

Author	SHA1	Message	Date
Simon Pilgrim	212b2bbcd1	[VectorCombine][X86] foldShuffleOfCastops - fold shuffle(cast(x),cast(y)) -> cast(shuffle(x,y)) iff cost efficient (#87510 ) Based off the existing foldShuffleOfBinops fold Fixes #67803	2024-04-04 11:22:37 +01:00
Monad	56b3222b79	[InstCombine] Remove the canonicalization of `trunc` to `i1` (#84628 ) Remove the canonicalization of `trunc` to `i1` according to the suggestion of https://github.com/llvm/llvm-project/pull/83829#issuecomment-1986801166 `a84e66a92d/llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp (L737-L745)` Alive2: https://alive2.llvm.org/ce/z/cacYVA	2024-03-29 21:47:35 +08:00
Simon Pilgrim	ee5e027cc6	[X86] getShuffleCost - recognise concat_vector(X,Y) shuffle as InsertSubvector instead of PermuteTwoSrc We don't have a concat_vector shuffle kind and improveShuffleKindFromMask won't alter the base type to match it as InsertSubvector. But since this is how X86 will lower concat_vector anyhow, just recognise it explicitly. Another step for #67803	2024-03-21 09:29:39 +00:00
Simon Pilgrim	7812fcf3d7	[VectorCombine] foldBitcastShuf - add support for binary shuffles (REAPPLIED) Generalise fold to "bitcast (shuf V0, V1, MaskC) --> shuf (bitcast V0), (bitcast V1), MaskC'". Reapplied with a clang codegen test fix. Further prep work for #67803	2024-03-20 15:06:19 +00:00
Simon Pilgrim	ada24ae5e6	Revert 2ac85d8d200a9e1e0ced501c2d2f04404c400bd9 "[VectorCombine] foldBitcastShuf - add support for binary shuffles" Breaks some tests in other subprojects - will recommit with a fix later	2024-03-20 13:39:42 +00:00
Simon Pilgrim	2ac85d8d20	[VectorCombine] foldBitcastShuf - add support for binary shuffles Generalise fold to "bitcast (shuf V0, V1, MaskC) --> shuf (bitcast V0), (bitcast V1), MaskC'". Further prep work for #67803	2024-03-20 13:19:30 +00:00
Simon Pilgrim	08e036e734	[PhaseOrdering][X86] Add test coverage for #67803	2024-03-05 15:50:37 +00:00
Nilanjana Basu	c1c5b854ad	[LV] Remove loop trip count threshold for deciding whether to interleave a loop (#67725 ) A set of microbenchmarks (https://github.com/llvm/llvm-test-suite/pull/26) showed that loop interleaving can be beneficial for loops with low trip count as well. Loop interleaving count computation is updated accordingly in prior patches while this patch removes the loop trip count threshold for interleaving.	2024-02-05 17:23:58 -08:00
Nikita Popov	90ba33099c	[InstCombine] Canonicalize constant GEPs to i8 source element type (#68882 ) This patch canonicalizes getelementptr instructions with constant indices to use the `i8` source element type. This makes it easier for optimizations to recognize that two GEPs are identical, because they don't need to see past many different ways to express the same offset. This is a first step towards https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699. This is limited to constant GEPs only for now, as they have a clear canonical form, while we're not yet sure how exactly to deal with variable indices. The test llvm/test/Transforms/PhaseOrdering/switch_with_geps.ll gives two representative examples of the kind of optimization improvement we expect from this change. In the first test SimplifyCFG can now realize that all switch branches are actually the same. In the second test it can convert it into simple arithmetic. These are representative of common optimization failures we see in Rust. Fixes https://github.com/llvm/llvm-project/issues/69841.	2024-01-24 15:25:29 +01:00
Nikita Popov	cd7ea4ea65	[LAA] Drop alias scope metadata that is not valid across iterations (#79161 ) LAA currently adds memory locations with their original AATags to AST. However, scoped alias AATags may be valid only within one loop iteration, while LAA reasons across iterations. Fix this by determining which alias scopes are defined inside the loop, and drop AATags that reference these scopes. Fixes https://github.com/llvm/llvm-project/issues/79137.	2024-01-24 11:20:16 +01:00
Nikita Popov	543cf08636	[PhaseOrdering] Add additional test for #79161 (NFC)	2024-01-24 10:46:11 +01:00
Florian Hahn	f18536d642	[VPlan] Model address separately. (#72164 ) Move vector pointer generation to a separate VPVectorPointerRecipe. This untangles address computation from the memory recipes future and is also needed to enable explicit unrolling in VPlan. https://github.com/llvm/llvm-project/pull/72164	2024-01-01 19:51:15 +00:00
Yingwei Zheng	1228becf7d	[FuncAttrs] Deduce `noundef` attributes for return values (#76553 ) This patch deduces `noundef` attributes for return values. IIUC, a function returns `noundef` values iff all of its return values are guaranteed not to be `undef` or `poison`. Definition of `noundef` from LangRef: ``` noundef This attribute applies to parameters and return values. If the value representation contains any undefined or poison bits, the behavior is undefined. Note that this does not refer to padding introduced by the type’s storage representation. ``` Alive2: https://alive2.llvm.org/ce/z/g8Eis6 Compile-time impact: http://llvm-compile-time-tracker.com/compare.php?from=30dcc33c4ea3ab50397a7adbe85fe977d4a400bd&to=c5e8738d4bfbf1e97e3f455fded90b791f223d74&stat=instructions:u \|stage1-O3\|stage1-ReleaseThinLTO\|stage1-ReleaseLTO-g\|stage1-O0-g\|stage2-O3\|stage2-O0-g\|stage2-clang\| \|--\|--\|--\|--\|--\|--\|--\| \|+0.01%\|+0.01%\|-0.01%\|+0.01%\|+0.03%\|-0.04%\|+0.01%\| The motivation of this patch is to reduce the number of `freeze` insts and enable more optimizations.	2023-12-31 20:44:48 +08:00
Nikita Popov	a5f3415533	[InstCombine] Replace non-demanded undef vector with poison If an operand (esp to shufflevector or insertelement) is not demanded, canonicalize it from undef to poison.	2023-12-18 16:12:37 +01:00
Nikita Popov	cf47af493b	[InstCombine] Generalize folds for inversion of icmp operands (#74317 ) We have a bunch of folds that basically perform X pred Y to ~Y pred ~X for various special cases where this saves an instruction. Generalize these folds to use isFreeToInvert(). We have to make sure that we consume an instruction in either of the inversions, otherwise we're just going to swap the icmp back and forth. Fixes https://github.com/llvm/llvm-project/issues/74302.	2023-12-08 11:25:41 +01:00
Nikita Popov	d77067d08a	[ValueTracking] Add dominating condition support in computeKnownBits() (#73662 ) This adds support for using dominating conditions in computeKnownBits() when called from InstCombine. The implementation uses a DomConditionCache, which stores which branches may provide information that is relevant for a given value. DomConditionCache is similar to AssumptionCache, but does not try to do any kind of automatic tracking. Relevant branches have to be explicitly registered and invalidated values explicitly removed. The necessary tracking is done inside InstCombine. The reason why this doesn't just do exactly the same thing as AssumptionCache is that a lot more transforms touch branches and branch conditions than assumptions. AssumptionCache is an immutable analysis and mostly gets away with this because only a handful of places have to register additional assumptions (mostly as a result of cloning). This is very much not the case for branches. This change regresses compile-time by about ~0.2%. It also improves stage2-O0-g builds by about ~0.2%, which indicates that this change results in additional optimizations inside clang itself. Fixes https://github.com/llvm/llvm-project/issues/74242.	2023-12-06 14:17:18 +01:00
Craig Topper	7ec4f6094e	[InstCombine] Infer disjoint flag on Or instructions. (#72912 ) The disjoint flag was recently added to IR in #72583 We already set it when we turn an add into an or. This patch sets it on Ors that weren't converted from an Add.	2023-12-02 14:11:12 -08:00
Craig Topper	03d4a9d94d	[InstCombine] Set disjoint flag when turning Add into Or. (#72702 ) The disjoint flag was recently added to IR in #72583	2023-11-27 12:54:11 -08:00
Yingwei Zheng	dc6d077396	[CVP] Infer nneg on existing zext (#72052 ) This patch infers `nneg` flags for existing zext instructions in CVP. After https://github.com/llvm/llvm-project/pull/71534 and this patch, we can drop `zext -> zext nneg` transform in `RISCVCodeGenPrepare`: `40671bbdef/llvm/lib/Target/RISCV/RISCVCodeGenPrepare.cpp (L74-L83)` This is an alternative to #72049.	2023-11-13 22:41:37 +08:00
dewen	3b82336188	Revert "[PM] Execute IndVarSimplifyPass precede RessociatePass" (#71617 ) Reverts llvm/llvm-project#71054	2023-11-08 09:22:55 +08:00
dewen	e4d27d7f32	[PM] Execute IndVarSimplifyPass precede RessociatePass (#71054 ) ReassociatePass may clear nsw/nuw flags of some instructions, which may have side effects on optimizations in IndVarSimplifyPass.	2023-11-08 09:21:17 +08:00
Nikita Popov	30240e428f	[PhaseOrdering] Regenerate test checks (NFC)	2023-10-12 14:40:13 +02:00
DianQK	2d1e8a03f5	[EarlyCSE] Compare GEP instructions based on offset (#65875 ) Closes #65763. This will provide more opportunities for constant propagation for subsequent optimizations.	2023-09-20 06:14:45 +08:00
Alexey Bataev	c619222ea4	[SLP]Use common logic for cost estimation of the alternate vector nodes. We can use buildShuffleEntryMask() to build the shuffle mask correctly not only for the alternate nodes with reuses, but also for the nodes without reused scalars. It allows better to estimate the cost of the node and emit better code. Differential Revision: https://reviews.llvm.org/D157413	2023-08-09 11:50:39 -07:00
Florian Hahn	707359ecf5	Recommit "[LV] Re-use existing broadcast value for live-ins." This reverts commit 245ec675a4e41f7ec24dfc998720bffdc46a6c53. Recommits eea9258648ce with a fix to only erase the instruction from the first part if it is defined outside the loop. This fixes a use-after-free error reported.	2023-08-01 15:54:02 +01:00
Martin Storsjö	245ec675a4	Revert "[LV] Re-use existing broadcast value for live-ins." This reverts commit eea9258648ce73507f6f85c395de978af659d498. That commit triggered crashes in the following testcase: $ cat reduced.c typedef struct { int a[8] } b; typedef struct { b c; short d } e; void f() { int g; char h; e i = f; short j = i->d; int a = i->c->a[0]; for (;;) for (; g < a; g++) { h = j * i->d >> 8; h++; } } $ clang -target aarch64-linux-gnu -w -c -O2 reduced.c	2023-07-25 10:35:41 +03:00
Florian Hahn	eea9258648	[LV] Re-use existing broadcast value for live-ins. When requesting a vector value for a live-in, we can re-use the broadcast of the live-in of part 0 for parts > 0.	2023-07-24 11:50:47 +01:00
Arthur Eubanks	457dc72fdd	Reland [InstCombine] Infer inbounds for more GEPs of dereferenceable pointers Use Value::getPointerDereferenceableBytes() instead of hardcoding dereferenceable only for allocas. Allows us to infer inbounds GEPs for other Values like CallInsts and Arguments. Fixed clang test broken in initial land. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D153815	2023-06-27 09:31:20 -07:00
Arthur Eubanks	0f9df062ec	Revert "[InstCombine] Infer inbounds for more GEPs of dereferenceable pointers" This reverts commit cd43b19c0127d80f3543803359db0f03e363e893. Breaks clang/test/CodeGenOpenCL/builtins-amdgcn.cl.	2023-06-27 09:27:15 -07:00
Arthur Eubanks	cd43b19c01	[InstCombine] Infer inbounds for more GEPs of dereferenceable pointers Use Value::getPointerDereferenceableBytes() instead of hardcoding dereferenceable only for allocas. Allows us to infer inbounds GEPs for other Values like CallInsts and Arguments. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D153815	2023-06-27 09:13:00 -07:00
Arthur Eubanks	7d6b8249fa	[test] Regenerate test checks	2023-06-20 18:20:52 -07:00
Noah Goldstein	3391bdc255	Revert "[FunctionAttrs] Propagate some func/arg/ret attributes from caller to callsite (WIP)" Accidental commit/push! This reverts commit 4fa971ff62c3c48c606b792c572c03bd4d5906ee.	2023-06-13 00:53:31 -05:00
Noah Goldstein	4fa971ff62	[FunctionAttrs] Propagate some func/arg/ret attributes from caller to callsite (WIP) This is the consolidation of D151644 and D151943 moved from InstCombine to FunctionAttrs. This is based on discussion in the above patches as well as D152081 (Attributor). This patch was written in a way so it can have an immediate impact in currently active passes (FunctionAttrs), but should be easy to port elsewhere (Attributor or Inliner) if that makes more sense later on. Some function attributes imply the attribute for all/some instructions in the function. These attributes can be safely propagated to callsites within the function that are missing the attribute. This can be useful when 1) analyzing individual instructions in a function and 2) if the original caller is later inlined, as if the attributes are not propagated, they will be lost. This patch implements propagation in a new class/file `InferCallsiteAttrs` which can hypothetically be included elsewhere. At the moment this patch infers the following: Function Attributes: - mustprogress - nofree - willreturn - All memory attributes (readnone, readonly, writeonly, argmem, etc...) - The memory attributes are only propagated IFF the set of pointers available to the callsite is the same as the set available outside the caller (i.e no local memory arguments from alloca or local malloc like functions). Argument Attributes: - noundef - nonnull - nofree - readnone - readonly - writeonly - nocapture - nocapture is only propagated IFF the set of pointers available to the callsite is the same as the set available outside the caller and its guranteed that between the callsite and function return, the state of any capture pointers will not change (so the nocaptured gurantee of the caller has been met by the instruction preceding the callsite and will not changed). Argument are only propagated to callsite arguments that are also function arguments, but not derived values. Return Attributes: - noundef - nonnull Return attributes are only propagated if the callsite's return value is used as the caller's return and execution is guranteed to pass from callsite to return. The compile time hit of this for -O3 and -O3+thinLTO is ~[.02, .37]% regression. Proper LTO, however, has more significant regressions (up to 3.92%): https://llvm-compile-time-tracker.com/compare.php?from=94407e1bba9807193afde61c56b6125c0fc0b1d1&to=79feb6e78b818e33ec69abdc58c5f713d691554f&stat=instructions:u Differential Revision: https://reviews.llvm.org/D152226	2023-06-13 00:47:43 -05:00
Tobias Hieta	f84bac329b	[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0 since I forgot the lit.local.cfg files in that one. Reformatting is done with `black`. If you end up having problems merging this commit because you have made changes to a python file, the best way to handle that is to run git checkout --ours <yourfile> and then reformat it with black. If you run into any problems, post to discourse about it and we will try to help. RFC Thread below: https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style Reviewed By: barannikov88, kwk Differential Revision: https://reviews.llvm.org/D150762	2023-05-17 17:03:15 +02:00
luxufan	f470922a29	Revert "Revert "[ValutTracking] Use isGuaranteedNotToBePoison in impliesPoison"" This reverts commit 706e8110573c83f140a63b40803d6370c86c1414.	2023-05-10 14:35:55 +08:00
ManuelJBrito	8b56da5e9f	[IR] Change shufflevector undef mask to poison With this patch an undefined mask in a shufflevector will be printed as poison. This change is done to support the new shufflevector semantics for undefined mask elements. Differential Revision: https://reviews.llvm.org/D149210	2023-04-27 14:41:10 +01:00
Simon Pilgrim	aa754f7e0f	[IR] llvm::createMinMaxOp - create integer min/max intrinsics instead of icmp/sel Based off D148215, when expanding a min/max reduction we should be creating min/max intrinsics directly instead of relying on instcombine to fold them back together. This patch handles integer min/max cases. Hopefully we can add floating point support soon (at least for fastmath/nnan cases) - but we're missing some of the plumbing to pass the correct FMF to the intrinsic at the moment. Differential Revision: https://reviews.llvm.org/D148221	2023-04-13 16:40:43 +01:00
Simon Pilgrim	07c5e175f6	[PhaseOrdering] Add test case for Issue #61061	2023-04-01 13:27:16 +01:00
Sanjay Patel	6c7b2eef47	[PhaseOrdering] add test for vector load and cast transforms; NFC issue #51397	2023-03-01 13:07:16 -05:00
Alexey Bataev	e03d254bbd	[SLP]Do not reduce repeated values, use scalar red ops instead. Metric: size..text size..text results results0 diff SingleSource/Regression/C/gcc-c-torture/execute/GCC-C-execute-980605-1.test 445.00 461.00 3.6% SingleSource/Benchmarks/Adobe-C++/loop_unroll.test 428477.00 428445.00 -0.0% External/SPEC/CFP2006/447.dealII/447.dealII.test 618849.00 618785.00 -0.0% For all tests some extra code was optimized, GCC-C-execute has some more inlining after Differential Revision: https://reviews.llvm.org/D132261	2023-02-17 07:19:35 -08:00
Florian Hahn	68469a80cb	[LV] Disable runtime unrolling for vectorized loops. This patch adds metadata to disable runtime unrolling to the vectorized loop. If runtime unrolling/interleaving is considered profitable, LV will interleave the loop directly. There should be no need to perform runtime unrolling at a later stage. Note that we already add metadata to disable runtime unrolling to the scalar loop after vectorization. The additional unrolling unnecessarily increases code size and compile time. In addition to that we have several bug reports of unncessary runtime unrolling for vectorized loops, e.g. PR40961 Compile-time improvements: NewPM-O3: -1.04% NewPM-ReleaseThinLTO: -0.59% NewPM-ReleaseLTO-g: -0.97% https://llvm-compile-time-tracker.com/compare.php?from=ce1be13a868d0f8afa367975558c1a6175cce33a&to=78bc2e67f22e9e10e61cdb6cdac4bb857d95eb1b&stat=instructions:u Fixes #40306. Reviewed By: lebedev.ri, nikic Differential Revision: https://reviews.llvm.org/D115261	2023-01-06 10:56:17 +00:00
Sanjay Patel	d5f8878a6e	[InstCombine] canonicalize insertelement order based on index This puts lower insert indexes before higher. This is independent of endian, so it requires an adjustment to a fold added with 4446f71ce392, but it makes that fold more robust. That's also where this patch was suggested - D139668. This matches what we already do in DAGCombiner, but there is one more constraint because there's an existing canonicalization for insert-of-scalar-constant. I'm not sure if that is still needed, so it may be adjusted/removed as a follow-up.	2022-12-18 07:08:48 -05:00
Sanjay Patel	4446f71ce3	[InstCombine] try to fold a pair of insertelements into one insertelement This replaces patches that tried to convert related patterns to shuffles (D138872, D138873, D138874 - reverted/abandoned) but caused codegen problems and were questionable as a canonicalization because an insertelement is a simpler op than a shuffle. This detects a larger pattern -- insert-of-insert -- and replaces with another insert, so this hopefully does not cause any problems. As noted by TODO items in the code and tests, this could go a lot further. But this is enough to reduce the motivating test from issue #17113. Example proofs: https://alive2.llvm.org/ce/z/NnUv3a I drafted a version of this for AggressiveInstCombine, but it seems that would uncover yet another phase ordering gap. If we do generalize this to handle the full range of potential patterns, that may be worth looking at again. Differential Revision: https://reviews.llvm.org/D139668	2022-12-12 10:39:58 -05:00
Sanjay Patel	05dbdb0088	Revert "[InstCombine] canonicalize trunc + insert as bitcast + shuffle, part 1 (2nd try)" This reverts commit e71b81cab09bf33e3b08ed600418b72cc4117461. As discussed in the planned follow-on to this patch (D138874), this and the subsequent patches in this set can cause trouble for the backend, and there's probably no quick fix. We may even want to canonicalize in the opposite direction (towards insertelt).	2022-12-08 14:16:46 -05:00
Roman Lebedev	75d1a815c3	[NFC] Port all PhaseOrdering tests to `-passes=` syntax	2022-12-08 02:38:50 +03:00
Sanjay Patel	e71b81cab0	[InstCombine] canonicalize trunc + insert as bitcast + shuffle, part 1 (2nd try) The first attempt was reverted because a clang test changed unexpectedly - the file is already marked with a FIXME, so I just updated it this time to pass. Original commit message: This is the main patch for converting a truncated scalar that is inserted into a vector to bitcast+shuffle. We could go either way on patterns like this, but this direction will allow collapsing a pair of these sequences on the motivating example from issue The patch is split into 3 parts to make it easier to see the progression of tests diffs. We allow inserting/shuffling into a different size vector for flexibility, so there are several test variations. The length-changing is handled by shortening/padding the shuffle mask with undef elements. In part 1, handle the basic pattern: inselt undef, (trunc T), IndexC --> shuffle (bitcast T), IdentityMask Proof for the endian-dependency behaving as expected: https://alive2.llvm.org/ce/z/BsA7yC The TODO items for handling shifts and insert into an arbitrary base vector value are implemented as follow-ups. Differential Revision: https://reviews.llvm.org/D138872	2022-11-30 14:52:20 -05:00
Sanjay Patel	5eacdcff06	Revert "[InstCombine] canonicalize trunc + insert as bitcast + shuffle, part 1" This reverts commit a4c466766db77cd1fb42d7f98f32bb87a3d38829. This broke clang tests that are wrongly dependent on the optimizer.	2022-11-30 14:10:50 -05:00
Sanjay Patel	a4c466766d	[InstCombine] canonicalize trunc + insert as bitcast + shuffle, part 1 This is the main patch for converting a truncated scalar that is inserted into a vector to bitcast+shuffle. We could go either way on patterns like this, but this direction will allow collapsing a pair of these sequences on the motivating example from issue The patch is split into 3 parts to make it easier to see the progression of tests diffs. We allow inserting/shuffling into a different size vector for flexibility, so there are several test variations. The length-changing is handled by shortening/padding the shuffle mask with undef elements. In part 1, handle the basic pattern: inselt undef, (trunc T), IndexC --> shuffle (bitcast T), IdentityMask Proof for the endian-dependency behaving as expected: https://alive2.llvm.org/ce/z/BsA7yC The TODO items for handling shifts and insert into an arbitrary base vector value are implemented as follow-ups. Differential Revision: https://reviews.llvm.org/D138872	2022-11-30 13:22:04 -05:00
Sanjay Patel	c7bd82dfd8	[PhaseOrdering] add test for vector load combining; NFC This is another example from issue #17113	2022-11-28 16:00:06 -05:00
Matt Arsenault	1c55cc600e	PhaseOrdering: Convert tests to opaque pointers Required manually running update_test_checks: AArch64/hoisting-sinking-required-for-vectorization.ll AArch64/peel-multiple-unreachable-exits-for-vectorization.ll ARM/arm_mult_q15.ll X86/hoist-load-of-baseptr.ll X86/spurious-peeling.ll	2022-11-27 21:26:41 -05:00

1 2 3 4 5

235 Commits