llvm-project

Author	SHA1	Message	Date
Florian Hahn	fbcf8a8cbb	[ConstraintElim] Add (UGE, var, 0) to unsigned system for new vars. (#76262 ) The constraint system used for ConstraintElimination assumes all varibles to be signed. This can cause missed optimization in the unsigned system, due to missing the information that all variables are unsigned (non-negative). Variables can be marked as non-negative by adding Var >= 0 for all variables. This is done for arguments on ConstraintInfo construction and after adding new variables. This handles cases like the ones outlined in https://discourse.llvm.org/t/why-does-llvm-not-perform-range-analysis-on-integer-values/74341 The original example shared above is now handled without this change, but adding another variable means that instcombine won't be able to simplify examples like https://godbolt.org/z/hTnra7zdY Adding the extra variables comes with a slight compile-time increase https://llvm-compile-time-tracker.com/compare.php?from=7568b36a2bc1a1e496ec29246966ffdfc3a8b87f&to=641a47f0acce7755e340447386013a2e086f03d9&stat=instructions:u stage1-O3 stage1-ReleaseThinLTO stage1-ReleaseLTO-g stage1-O0-g +0.04% +0.07% +0.05% +0.02% stage2-O3 stage2-O0-g stage2-clang +0.05% +0.05% +0.05% https://github.com/llvm/llvm-project/pull/76262	2023-12-23 15:53:48 +01:00
Florian Hahn	e9a56ab316	[PhaseOrdering] Add test with removable chained conditions. Based on https://godbolt.org/z/hTnra7zdY, which is a slightly more complicated version of the example from https://discourse.llvm.org/t/why-does-llvm-not-perform-range-analysis-on-integer-values/74341	2023-12-22 19:44:20 +00:00
Nikita Popov	273a0c9c07	[PhaseOrdering] Add data layout to test (NFC) Needed for switch to lookup table optimization.	2023-12-20 11:49:34 +01:00
Nikita Popov	5ab5810054	[PhaseOrdering] Add additional test for switch with GEPs (NFC)	2023-12-20 11:41:46 +01:00
Nikita Popov	a5f3415533	[InstCombine] Replace non-demanded undef vector with poison If an operand (esp to shufflevector or insertelement) is not demanded, canonicalize it from undef to poison.	2023-12-18 16:12:37 +01:00
Nikita Popov	cf47af493b	[InstCombine] Generalize folds for inversion of icmp operands (#74317 ) We have a bunch of folds that basically perform X pred Y to ~Y pred ~X for various special cases where this saves an instruction. Generalize these folds to use isFreeToInvert(). We have to make sure that we consume an instruction in either of the inversions, otherwise we're just going to swap the icmp back and forth. Fixes https://github.com/llvm/llvm-project/issues/74302.	2023-12-08 11:25:41 +01:00
Nikita Popov	d77067d08a	[ValueTracking] Add dominating condition support in computeKnownBits() (#73662 ) This adds support for using dominating conditions in computeKnownBits() when called from InstCombine. The implementation uses a DomConditionCache, which stores which branches may provide information that is relevant for a given value. DomConditionCache is similar to AssumptionCache, but does not try to do any kind of automatic tracking. Relevant branches have to be explicitly registered and invalidated values explicitly removed. The necessary tracking is done inside InstCombine. The reason why this doesn't just do exactly the same thing as AssumptionCache is that a lot more transforms touch branches and branch conditions than assumptions. AssumptionCache is an immutable analysis and mostly gets away with this because only a handful of places have to register additional assumptions (mostly as a result of cloning). This is very much not the case for branches. This change regresses compile-time by about ~0.2%. It also improves stage2-O0-g builds by about ~0.2%, which indicates that this change results in additional optimizations inside clang itself. Fixes https://github.com/llvm/llvm-project/issues/74242.	2023-12-06 14:17:18 +01:00
Craig Topper	7ec4f6094e	[InstCombine] Infer disjoint flag on Or instructions. (#72912 ) The disjoint flag was recently added to IR in #72583 We already set it when we turn an add into an or. This patch sets it on Ors that weren't converted from an Add.	2023-12-02 14:11:12 -08:00
Craig Topper	03d4a9d94d	[InstCombine] Set disjoint flag when turning Add into Or. (#72702 ) The disjoint flag was recently added to IR in #72583	2023-11-27 12:54:11 -08:00
Florian Hahn	4ccdab3636	[ConstraintElim] Use isKnownNonNegative for condition transfer. (#72879 ) Use isKnownNonNegative for information transfer. This can improve results, in cases where ValueTracking can infer additional non-negative info, e.g. for phi nodes. This allows simplifying the check from https://github.com/llvm/llvm-project/issues/63126 by ConstraintElimination. It is also simplified by IndVarSimplify now; note the changes in llvm/test/Transforms/PhaseOrdering/loop-access-checks.ll, due to this now being simplified earlier.	2023-11-21 10:09:35 +00:00
Florian Hahn	10c0166909	[PhaseOrdering] Add tests where early sinking prevents if-conversion.	2023-11-16 20:31:21 +00:00
Yingwei Zheng	e8fe15ccf1	[InstCombine] Add exact flags for ext idiom `shr (shl X, Y), Y` (#72483 ) This patch adds exact flags for sext/zext idiom `shr (shl X, Y), Y`. Alive2: https://alive2.llvm.org/ce/z/xYFpfB We can generalize it to handle pattern `shr (shl X, Y), Z` with `Y u>= Z` (e.g., non-splat vectors). But I don't think it's worth the effort. This missed optimization is discovered with the help of https://github.com/AliveToolkit/alive2/pull/962.	2023-11-16 17:30:01 +08:00
Yingwei Zheng	dc6d077396	[CVP] Infer nneg on existing zext (#72052 ) This patch infers `nneg` flags for existing zext instructions in CVP. After https://github.com/llvm/llvm-project/pull/71534 and this patch, we can drop `zext -> zext nneg` transform in `RISCVCodeGenPrepare`: `40671bbdef/llvm/lib/Target/RISCV/RISCVCodeGenPrepare.cpp (L74-L83)` This is an alternative to #72049.	2023-11-13 22:41:37 +08:00
Léonard Oest O'Leary	ff36411b23	[InstCombine] Use zext's nneg flag for icmp folding (#70845 ) This PR fixes https://github.com/llvm/llvm-project/issues/55013 : the max intrinsics is not generated for this simple loop case : https://godbolt.org/z/hxz1xhMPh. This is caused by a ICMP not being folded into a select, thus not generating the max intrinsics. For the story : Since LLVM 14, SCCP pass got smarter by folding sext into zext for positive ranges : https://reviews.llvm.org/D81756. After this change, InstCombine was sometimes unable to fold ICMP correctly as both of the arguments pointed to mismatched zext/sext. To fix this, @rotateright implemented this fix : https://reviews.llvm.org/D124419 that tries to resolve the mismatch by knowing if the argument of a zext is positive (in which case, it is like a sext) by using ValueTracking, however ValueTracking is not smart enough to infer that the value is positive in some cases. Recently, @nikic implemented #67982 which keeps the information that a zext is non-negative. This PR simply uses this information to do the folding accordingly. TLDR : This PR uses the recent nneg tag on zext to fold the icmp accordingly in instcombine. This PR also contains test cases for sext/zext folding with InstCombine as well as a x86 regression tests for the max/min case.	2023-11-13 00:53:53 +08:00
dewen	3b82336188	Revert "[PM] Execute IndVarSimplifyPass precede RessociatePass" (#71617 ) Reverts llvm/llvm-project#71054	2023-11-08 09:22:55 +08:00
dewen	e4d27d7f32	[PM] Execute IndVarSimplifyPass precede RessociatePass (#71054 ) ReassociatePass may clear nsw/nuw flags of some instructions, which may have side effects on optimizations in IndVarSimplifyPass.	2023-11-08 09:21:17 +08:00
Johannes Doerfert	3de645efe3	[OpenMP][NFC] Split the reduction buffer size into two components Before we tracked the size of the teams reduction buffer in order to allocate it at runtime per kernel launch. This patch splits the number into two parts, the size of the reduction data (=all reduction variables) and the (maximal) length of the buffer. This will allow us to allocate less if we need less, e.g., if we have less teams than the maximal length. It also allows us to move code from clangs codegen into the runtime as we now know how large the reduction data is.	2023-11-06 11:50:41 -08:00
Florian Hahn	ab6bd9436a	[ConstraintElim] Add tests for additional SGT->UGT transfer. Test cases inspired by https://github.com/llvm/llvm-project/issues/63126.	2023-11-03 13:38:39 +00:00
Craig Topper	55c9f24344	[CVP] Infer nneg on zext when forming from non-negative sext. (#70715 ) Builds on #67982 which recently introduced the nneg flag on a zext instruction.	2023-10-30 13:48:27 -07:00
Philip Reames	3f2ed812f0	[InstCombine] Infer nneg on zext when forming from non-negative sext (#70706 ) Builds on #67982 which recently introduced the nneg flag on a zext instruction. InstCombine is one of our largest canonicalizers of zext from non-negative sext instructions, so set the flag there.	2023-10-30 12:09:43 -07:00
Amara Emerson	2228b35f93	Revert "Revert "[InstCombine] Add oneuse checks to shr + cmp constant folds."" This reverts commit d37b283cdd37feca5ea71456cf350005add268e7. There was a simple logic bug in the else path. Tests codegen is different with the fix.	2023-10-28 03:12:15 -07:00
Amara Emerson	d37b283cdd	Revert "[InstCombine] Add oneuse checks to shr + cmp constant folds." This reverts commit a66051c68a43af39f9fd962f71d58ae0efcf860d. This seems to have caused issue #70509 so reverting until I have time to investigate.	2023-10-27 14:27:58 -07:00
Mehdi Amini	f390a76b7e	Revert "Revert "[OpenMP][NFC] Add min/max threads/teams count into the KernelEnvironment (#70257 )"" This reverts commit ddbaa11e9f43a38d50d62a9b9b07c3653b6bf8ab. Reapply the original commit, the broken test was repaired in 5e51363f38d083ab326736c0d4d1b5f9fe0de080 in the meantime.	2023-10-26 17:30:01 -07:00
Mehdi Amini	ddbaa11e9f	Revert "[OpenMP][NFC] Add min/max threads/teams count into the KernelEnvironment (#70257 )" This reverts commit c2a1249a8257ed033a98e32e425539c6da6700ec. The MLIR bots are broken with an omp test failure.	2023-10-26 17:25:20 -07:00
Johannes Doerfert	c2a1249a82	[OpenMP][NFC] Add min/max threads/teams count into the KernelEnvironment (#70257 ) The runtime needs to know about the acceptable launch bounds, especially if the compiler (middle- or backend) assumed those bounds. While this patch does not yet inform the runtime, it stores the bounds in a place that can/will be accessed and is associated with the kernel.	2023-10-26 14:46:55 -07:00
Alex Richardson	e39f6c1844	[opt] Infer DataLayout from triple if not specified There are many tests that specify a target triple/CPU flags but no DataLayout which can lead to IR being generated that has unusual behaviour. This commit attempts to use the default DataLayout based on the relevant flags if there is no explicit override on the command line or in the IR file. One thing that is not currently possible to differentiate from a missing datalayout `target datalayout = ""` in the IR file since the current APIs don't allow detecting this case. If it is considered useful to support this case (instead of passing "-data-layout=" on the command line), I can change IR parsers to track whether they have seen such a directive and change the callback type. Differential Revision: https://reviews.llvm.org/D141060	2023-10-26 12:07:37 -07:00
Amara Emerson	a66051c68a	[InstCombine] Add oneuse checks to shr + cmp constant folds. This change has virtually no code size regressions on the llvm test suite (+ SPECs) while having these improvements (measured with -Os on Darwin arm64): External/S.../CFP2006/450.soplex/450.soplex 214024.00 213920.00 -0.0% External/S...7speed/641.leela_s/641.leela_s 93412.00 93348.00 -0.1% External/S...17rate/541.leela_r/541.leela_r 93412.00 93348.00 -0.1% MultiSourc.../Applications/JM/lencod/lencod 426044.00 425748.00 -0.1% MultiSourc...rks/mediabench/gsm/toast/toast 20436.00 20416.00 -0.1% MultiSourc...ench/telecomm-gsm/telecomm-gsm 20436.00 20416.00 -0.1% MultiSourc...Prolangs-C/assembler/assembler 16172.00 16156.00 -0.1% MultiSourc...nch/mpeg2/mpeg2dec/mpeg2decode 35332.00 35256.00 -0.2% SingleSour...Adobe-C++/stepanov_abstraction 6904.00 6888.00 -0.2% External/SPEC/CINT2000/254.gap/254.gap 366060.00 365132.00 -0.3% MultiSourc...-ProxyApps-C++/PENNANT/PENNANT 79688.00 79484.00 -0.3% External/S...NT2006/464.h264ref/464.h264ref 352044.00 351132.00 -0.3% SingleSour...arks/Adobe-C++/functionobjects 15524.00 15480.00 -0.3% SingleSour...arks/Adobe-C++/stepanov_vector 10728.00 10696.00 -0.3% SingleSour...ks/Misc-C++/stepanov_container 16900.00 16848.00 -0.3% MultiSource/Applications/oggenc/oggenc 124184.00 123780.00 -0.3% SingleSour...tout-C++/Shootout-C++-wordfreq 7060.00 7036.00 -0.3% MultiSourc...ity-rijndael/security-rijndael 8976.00 8936.00 -0.4% MultiSource/Benchmarks/McCat/18-imp/imp 9816.00 9772.00 -0.4% SingleSour...chmarks/Misc-C++/stepanov_v1p2 1772.00 1764.00 -0.5% MultiSourc...iabench/g721/g721encode/encode 5492.00 5464.00 -0.5% MultiSourc...rks/McCat/03-testtrie/testtrie 1364.00 1344.00 -1.5% SingleSour.../execute/GCC-C-execute-pr42833 400.00 364.00 -9.0% Doing so also prevents a regression described in https://reviews.llvm.org/D143624 Differential Revision: https://reviews.llvm.org/D149918	2023-10-26 11:36:10 -07:00
Amara Emerson	7ba99fd75e	[InstCombine][NFC] Precommit tests for https://reviews.llvm.org/D149918	2023-10-20 13:54:27 -07:00
Nikita Popov	30240e428f	[PhaseOrdering] Regenerate test checks (NFC)	2023-10-12 14:40:13 +02:00
Nikita Popov	37441f1ae6	[PhaseOrdering] Add test for switch with different GEP types (NFC)	2023-10-12 14:01:58 +02:00
Florian Hahn	56a3e49a00	[ConstraintElim] Support decrementing inductions with step -1. (#68644 ) Extend the logic in addInfoForInductions to support decrementing inductions with a step of -1. Fixes #64881.	2023-10-10 09:37:27 -07:00
Alex Richardson	e86d6a43f0	Regenerate test checks for tests affected by D141060	2023-10-04 10:51:35 -07:00
Florian Hahn	98e016d997	[ConstraintElim] Handle trivial (ICMP_ULE, 0, B) in doesHold. D152730 may add trivial pre-conditions of the form (ICMP_ULE, 0, B), which won't be handled automatically by the constraint system, because we don't add Var >= 0 for all variables in the unsigned system. Handling the trivial condition explicitly here avoids having the increase the number of rows in the system per variable. https://alive2.llvm.org/ce/z/QC92ur Depends on D152730. Fixes #63125. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D158776	2023-09-27 12:11:28 +01:00
Florian Hahn	e6a1657fa3	[ConstraintElim] Add A < B if A is an increasing phi for A != B. This patch adds additional logic to add additional facts for A != B, if A is a monotonically increasing induction phi. The motivating use case for this is removing checks when using iterators with hardened libc++, e.g. https://godbolt.org/z/zhKEP37vG. The patch pulls in SCEV to detect AddRecs. If possible, the patch adds the following facts for a AddRec phi PN with StartValue as incoming value from the loo preheader and B being an upper bound for PN from a condition in the loop header. * (ICMP_UGE, PN, StartValue) * (ICMP_ULT, PN, B) [if (ICMP_ULE, StartValue, B)] The patch also adds an optional precondition to FactOrCheck (the new DoesHold field) , which can be used to only add a fact if the precondition holds at the point the fact is added to the constraint system. Depends on D151799. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D152730	2023-09-27 11:00:28 +01:00
Florian Hahn	04f9a8a7d6	[ConstraintElim] Move just before loop simplification pipeline. Adjust the pipeline slightly to move ConstraintElim just before the loop simplification pipeline. This increases the number of cases where SCEV should can preserved in the future. This also enables slightly more opportunities, by benefiting from earlier CFG simplifications, which allow more conditions to be added. Reviewed By: nikic, antoniofrighetto Differential Revision: https://reviews.llvm.org/D158843	2023-09-22 14:31:08 +01:00
Florian Hahn	e6ebd2890f	[ConstraintElim] Add phase ordering tests for pipeline adjustment. Phaseordering tests for pipeline adjustment in D158843.	2023-09-21 21:01:41 +01:00
Dhruv Chawla	3e992d81af	[InferAlignment] Enable InferAlignment pass by default This gives an improvement of 0.6%: https://llvm-compile-time-tracker.com/compare.php?from=7d35fe6d08e2b9b786e1c8454cd2391463832167&to=0456c8e8a42be06b62ad4c3e3cf34b21f2633d1e&stat=instructions:u Differential Revision: https://reviews.llvm.org/D158600	2023-09-20 12:08:52 +05:30
DianQK	2d1e8a03f5	[EarlyCSE] Compare GEP instructions based on offset (#65875 ) Closes #65763. This will provide more opportunities for constant propagation for subsequent optimizations.	2023-09-20 06:14:45 +08:00
Florian Hahn	1a2e344b10	[PhaseOrdering] Add additional end-to-end range check removal tests. Tests for https://github.com/llvm/llvm-project/issues/63125 https://github.com/llvm/llvm-project/issues/64881	2023-08-24 19:22:17 +01:00
Changpeng Fang	c1803d5366	[FunctionAttrs] Unconditionally perform argument attribute inference in the first function-attrs pass Summary: Argument attributes like NoAlias and ReadOnly could affect memoryssa and thus earlyCSE in the function simplification pipeline. https://reviews.llvm.org/D145210 adjusted PostOrderFunctionAttrs placement and caused the argument attributes not referred for the use in the pipeline. This work (initiated by @nikic) unconditionally performs argument attribute inference in the first function-attrs pass. Reviewers: aeubanks and nikic Differential Revision: https://reviews.llvm.org/D156397	2023-08-09 17:49:14 -07:00
Alexey Bataev	c619222ea4	[SLP]Use common logic for cost estimation of the alternate vector nodes. We can use buildShuffleEntryMask() to build the shuffle mask correctly not only for the alternate nodes with reuses, but also for the nodes without reused scalars. It allows better to estimate the cost of the node and emit better code. Differential Revision: https://reviews.llvm.org/D157413	2023-08-09 11:50:39 -07:00
DianQK	c3f227ead6	[TailCallElim] Remove the readonly attribute of byval. When eliminating a tail call, we modify the values of the arguments. Therefore, if the byval parameter has a readonly attribute, we have to remove it. It is safe because, from the perspective of a caller, the byval parameter is always treated as "readonly," even if the readonly attribute is removed. Fixes #64289. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D156793	2023-08-09 07:07:47 +08:00
Nikita Popov	b92711931d	[PhaseOrdering] Add test for quant_4x4 vectorization (NFC) Failure to vectorize this led to a revert of D156532, so add a PhaseOrdering test to prevent this from happening again.	2023-08-08 17:36:07 +02:00
David Green	05b4310c8a	Revert "[Pipelines] Perform hoisting prior to GVN" This reverts commit 1f37088679a5c2416707d477093950e48148d430 as it causes a large regression in x264, and some other regressions in downstream embedded benchmarks under LTO.	2023-08-08 15:32:24 +01:00
Nikita Popov	1f37088679	[Pipelines] Perform hoisting prior to GVN We currently only enable hoisting in the last SimplifyCFG run of the function simplification pipeline. In particular this happens after GVN, which means that instructions that were identical (and thus hoistable) prior to GVN might no longer be so after it ran, due to equality replacements (see the phase ordering test). The history here is that D84108 restricted hoisting to the very late (module optimization) pipeline only. Then D101468 went back on that, and also performed it at the end of function simplification. This patch goes one step further and allows it prior to GVN. Importantly, we still don't perform hoisting before LoopRotate, which was the original motivation for delaying it. Differential Revision: https://reviews.llvm.org/D156532	2023-08-07 10:06:00 +02:00
Florian Hahn	707359ecf5	Recommit "[LV] Re-use existing broadcast value for live-ins." This reverts commit 245ec675a4e41f7ec24dfc998720bffdc46a6c53. Recommits eea9258648ce with a fix to only erase the instruction from the first part if it is defined outside the loop. This fixes a use-after-free error reported.	2023-08-01 15:54:02 +01:00
Nikita Popov	d01aec4c76	[InstCombine] Set dead phi inputs to poison in more cases Set phi inputs to poison whenever we find a dead edge (either during initial worklist population or the main InstCombine run), instead of only doing this for successors of dead blocks. This means that the phi operand is set to poison even if for critical edges without an intermediate block. There are quite a few test changes, because the pattern is fairly common in vectorizer output, for cases where we know the vectorized loop will be entered.	2023-08-01 11:53:47 +02:00
Nikita Popov	7c64449e44	[LoopVectorize] Regenerate test checks (NFC) To reduce spurious diffs in future changes.	2023-08-01 11:30:55 +02:00
Nikita Popov	41895843b5	[InstCombine] Only perform one iteration InstCombine is a worklist-driven algorithm, which works roughly as follows: * All instructions are initially pushed to the worklist. The initial order is in RPO program order. * All newly inserted instructions get added to the worklist. * When an instruction is folded, its users get added back to the worklist. * When the use-count of an instruction decreases, it gets added back to the worklist. * And a few of other heuristics on when we should revisit instructions. On top of the worklist algorithm, InstCombine layers an additional fix-point iteration: If any fold was performed in the previous iteration, then InstCombine will re-populate the worklist from scratch and fold the entire function again. This continues until a fix-point is reached. In the vast majority of cases, InstCombine will reach a fix-point within a single iteration: However, a second iteration is performed to verify that this is indeed the fixpoint. We can see this in the statistics for llvm-test-suite: "instcombine.NumOneIteration": 411380, "instcombine.NumTwoIterations": 117921, "instcombine.NumThreeIterations": 236, "instcombine.NumFourOrMoreIterations": 2, The way to read these numbers is that in 411380 cases, InstCombine performs no folds. In 117921 cases it performs a fold and reaches the fix-point within one iteration (the second iteration verifies the fixpoint). In the remaining 238 cases, more than one iteration is needed to reach the fixpoint. In other words, only in 0.04% of cases are additional iterations needed to reach a fixpoint. Conversely, in 22.3% of cases InstCombine performs a completely useless extra iteration to verify the fix point. This patch removes the fixpoint iteration from InstCombine, and always only perform a single iteration. This results in a major compile-time improvement of around 4% at negligible codegen impact. This explicitly does accept that we will not reach a fixpoint in all cases. However, this is mitigated by two factors: First, the data suggests that this happens very rarely in practice. Second, InstCombine runs many times during the optimization pipeline (8 times even without LTO), so there are many chances to recover such cases. In order to prevent accidental optimization regressions in the future, this implements a verify-fixpoint option, which is enabled by default when instcombine is specified in -passes and disabled when InstCombinePass() is constructed from C++. This means that test cases need to explicitly use the no-verify-fixpoint option if they fail to reach a fixed point (for a well understand reason we cannot / do not want to avoid). Differential Revision: https://reviews.llvm.org/D154579	2023-07-31 10:56:49 +02:00
Nikita Popov	bbe2887f5e	[PhaseOrdering] Add test for GVN/hoist order (NFC) Based on https://discourse.llvm.org/t/rfc-what-are-the-blockers-to-turning-on-gvnsink-by-default/72326.	2023-07-28 15:13:51 +02:00

1 2 3 4 5 ...

548 Commits