llvm-project

Author	SHA1	Message	Date
David Sherwood	905083f3c1	[LTO] Ensure LICM hoists expensive fdiv instructions introduced by InstCombine In the LTO pipeline we run InstCombine after LICM, which is different to what we normally do without LTO. This has the effect of undoing all the great work done by LICM to reduce the cost of the loop when it hoists the fdiv out and replaces it with fmul. When InstCombine runs after LICM it puts the fdiv straight back which, on AArch64 at least, is darn expensive. You can observe this problem in the SPEC2017 benchmark parest if you build with "-Ofast -flto" and the loop-vectoriser uses an unroll factor of 1, which is what often happens when tail-folding is enabled. This is also a problem for scalar loops, or indeed any loop where there is only one use of the preheader fdiv result in the loop. See InstCombinerImpl::visitFMul for the code that sinks the fdiv. I've attempted to fix this by adding another LICM pass for Full LTO after InstCombine. The alternative is to stop InstCombine from sinking the fdiv into loops. See D87479 for a previous discussion on this issue. Differential Revision: https://reviews.llvm.org/D143631	2023-07-07 11:06:24 +00:00
Nikita Popov	336d7281ad	[InstCombine] Preserve inbounds when folding select of GEP The select base, (gep base, offset) to gep base, select (0, offset) fold used to drop inbounds, because the gep base, 0 this introduces might not be inbounds. After the semantics change in D154051, such a GEP is always considered inbounds, in which allows us to preserve the flag here. As the PhaseOrdering test demonstrates, this can result in major optimization improvements in some cases. Differential Revision: https://reviews.llvm.org/D154055	2023-07-07 09:56:33 +02:00
Arthur Eubanks	22ca38da25	[ScalarEvolution] Analyze ranges for heap allocations Followup to D153624. Allows for better exit count calculations for loops checking heap allocations against null. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D154001	2023-06-29 09:35:20 -07:00
Arthur Eubanks	b24b6c4a32	[PhaseOrdering] Add test with gep null compare in loop (NFC) Test from D153392 in both the alloca and malloc variants.	2023-06-29 10:19:28 +02:00
Arthur Eubanks	457dc72fdd	Reland [InstCombine] Infer inbounds for more GEPs of dereferenceable pointers Use Value::getPointerDereferenceableBytes() instead of hardcoding dereferenceable only for allocas. Allows us to infer inbounds GEPs for other Values like CallInsts and Arguments. Fixed clang test broken in initial land. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D153815	2023-06-27 09:31:20 -07:00
Arthur Eubanks	0f9df062ec	Revert "[InstCombine] Infer inbounds for more GEPs of dereferenceable pointers" This reverts commit cd43b19c0127d80f3543803359db0f03e363e893. Breaks clang/test/CodeGenOpenCL/builtins-amdgcn.cl.	2023-06-27 09:27:15 -07:00
Arthur Eubanks	cd43b19c01	[InstCombine] Infer inbounds for more GEPs of dereferenceable pointers Use Value::getPointerDereferenceableBytes() instead of hardcoding dereferenceable only for allocas. Allows us to infer inbounds GEPs for other Values like CallInsts and Arguments. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D153815	2023-06-27 09:13:00 -07:00
Nikita Popov	c0de28b92e	[BasicAA] Don't short-circuit non-capturing arguments This is an alternative to D153464. BasicAA currently assumes that an unescaped alloca cannot be read through non-nocapture arguments of a call, based on the argument that if the argument were based on the alloca, it would not be unescaped. This currently fails in the case where the call is an ephemeral value and as such does not count as a capture. It also happens for calls that are readonly+nounwind+void, though that case tends to not matter in practice, because such calls will get DCEd anyway. Differential Revision: https://reviews.llvm.org/D153511	2023-06-26 12:27:32 +02:00
Florian Hahn	04a7c672ab	[PhaseOrdering] Add test showing mis-compile caused by 17fdaccccf. The test shows a mis-compile where @test gets incorrectly simplified to unreachable. The test case is reduced from a ThinLTO build of Clang, with only the relevant pass sequence included.	2023-06-21 21:15:14 +01:00
Arthur Eubanks	7d6b8249fa	[test] Regenerate test checks	2023-06-20 18:20:52 -07:00
Zhongyunde	cb353dc74e	[LV] Add cost model for simd vector select instructions of type float For simd vector selects, use cmeq + bsl for v2f32/v4f32/v2f64, so their cost are cheep. Fix https://github.com/llvm/llvm-project/issues/63082 Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D152523	2023-06-20 13:12:19 +08:00
Arthur Eubanks	3e39cfe5b4	Revert "Revert "InstSimplify: Require instruction be parented"" This reverts commit 0c03f48480f69b854f86d31235425b5cb71ac921. Going to fix forward size regression instead due to more dependent patches needing to be reverted otherwise.	2023-06-16 13:53:31 -07:00
Arthur Eubanks	0c03f48480	Revert "InstSimplify: Require instruction be parented" This reverts commit 1536e299e63d7788f38117b0212ca50eb76d7a3b. Causes large binary size regressions, see comments on https://reviews.llvm.org/rG1536e299e63d7788f38117b0212ca50eb76d7a3b.	2023-06-16 11:24:29 -07:00
Nikita Popov	bf9779798b	[PhaseOrdering] Regenerate test checks (NFC) Just naming changes.	2023-06-14 10:08:46 +02:00
Noah Goldstein	91cdffcb2f	[InstCombine] Transform `(binop1 (binop2 (lshift X,Amt),Mask),(lshift Y,Amt))` If `Mask` and `Amt` are not constants and `binop1` and `binop2` are the same we can transform to: `(binop (lshift (binop X, Y), Amt), Mask)` If `binop` is `add`, `lshift` must be `shl`. If `Mask` and `Amt` are constants `C` and `C1` respectively. We can transform to: `(lshift1 (binop1 (binop2 X, (inv_lshift1 C, C1), Y)), C1)` Saving an instruction IFF: `lshift1` is same opcode as `lshift2` Either `bitwise1` and/or `bitwise2` is `and`. Proofs(1/2): https://alive2.llvm.org/ce/z/BjN-m_ Proofs(2/2): https://alive2.llvm.org/ce/z/bZn5QB This is to help fix the regression caused in D151807 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D152568	2023-06-13 20:08:35 -05:00
Noah Goldstein	3391bdc255	Revert "[FunctionAttrs] Propagate some func/arg/ret attributes from caller to callsite (WIP)" Accidental commit/push! This reverts commit 4fa971ff62c3c48c606b792c572c03bd4d5906ee.	2023-06-13 00:53:31 -05:00
Noah Goldstein	4fa971ff62	[FunctionAttrs] Propagate some func/arg/ret attributes from caller to callsite (WIP) This is the consolidation of D151644 and D151943 moved from InstCombine to FunctionAttrs. This is based on discussion in the above patches as well as D152081 (Attributor). This patch was written in a way so it can have an immediate impact in currently active passes (FunctionAttrs), but should be easy to port elsewhere (Attributor or Inliner) if that makes more sense later on. Some function attributes imply the attribute for all/some instructions in the function. These attributes can be safely propagated to callsites within the function that are missing the attribute. This can be useful when 1) analyzing individual instructions in a function and 2) if the original caller is later inlined, as if the attributes are not propagated, they will be lost. This patch implements propagation in a new class/file `InferCallsiteAttrs` which can hypothetically be included elsewhere. At the moment this patch infers the following: Function Attributes: - mustprogress - nofree - willreturn - All memory attributes (readnone, readonly, writeonly, argmem, etc...) - The memory attributes are only propagated IFF the set of pointers available to the callsite is the same as the set available outside the caller (i.e no local memory arguments from alloca or local malloc like functions). Argument Attributes: - noundef - nonnull - nofree - readnone - readonly - writeonly - nocapture - nocapture is only propagated IFF the set of pointers available to the callsite is the same as the set available outside the caller and its guranteed that between the callsite and function return, the state of any capture pointers will not change (so the nocaptured gurantee of the caller has been met by the instruction preceding the callsite and will not changed). Argument are only propagated to callsite arguments that are also function arguments, but not derived values. Return Attributes: - noundef - nonnull Return attributes are only propagated if the callsite's return value is used as the caller's return and execution is guranteed to pass from callsite to return. The compile time hit of this for -O3 and -O3+thinLTO is ~[.02, .37]% regression. Proper LTO, however, has more significant regressions (up to 3.92%): https://llvm-compile-time-tracker.com/compare.php?from=94407e1bba9807193afde61c56b6125c0fc0b1d1&to=79feb6e78b818e33ec69abdc58c5f713d691554f&stat=instructions:u Differential Revision: https://reviews.llvm.org/D152226	2023-06-13 00:47:43 -05:00
Florian Hahn	6162f6e9ff	[ConstraintElim] Add additional monotonic phi tests. Add extra test coverage with cases that showed mis-compiles in earlier versions of an upcoming patch. Also add tests for integer phis	2023-06-10 21:43:06 +01:00
Shivam Gupta	46aba711ab	[InstCombine] (icmp eq A, -1) & (icmp eq B, -1) --> (icmp eq (A&B), -1) This patch add another icmp fold for -1 case. This fixes https://github.com/llvm/llvm-project/issues/62311, where we want instcombine to merge all compare intructions together so later passes like simplifycfg and slpvectorize can better optimize this chained comparison. Reviewed By: goldstein.w.n Differential Revision: https://reviews.llvm.org/D151660	2023-06-08 09:00:05 +05:30
Shivam Gupta	0535da6f11	[InstCombine] Add test case for (icmp eq A, -1) & (icmp eq B, -1) --> (icmp eq (A&B), -1); NFC Reviewed By: goldstein.w.n Differential Revision: https://reviews.llvm.org/D151694	2023-06-08 09:00:02 +05:30
Johannes Doerfert	8f4fadd1b4	[OpenMP] Use "kernel" attribute consistently	2023-06-05 16:33:53 -07:00
Florian Hahn	cd2fc73b49	Revert "[ValueTracking][InstCombine] Add a new API to allow to ignore poison generating flags or metadatas when implying poison" This reverts commit 754f3ae65518331b7175d7a9b4a124523ebe6eac. Unfortunately the change can cause regressions due to dropping flags from instructions (like nuw,nsw,inbounds), prevent further optimizations depending on those flags. A simple example is the IR below, where `inbounds` is dropped with the patch and the phase-ordering test added in 7c91d82ab912fae8b. define i1 @test(ptr %base, i64 noundef %len, ptr %p2) { bb: %gep = getelementptr inbounds i32, ptr %base, i64 %len %c.1 = icmp uge ptr %p2, %base %c.2 = icmp ult ptr %p2, %gep %select = select i1 %c.1, i1 %c.2, i1 false ret i1 %select } For more discussion, see D149404.	2023-05-29 15:44:37 +01:00
Florian Hahn	7c91d82ab9	[PhaseOrdering] Add test for loop over span with hardened libc++. Add a slightly reduced test case for a loop iterating over a std::span with libc++ hardening. See https://godbolt.org/z/cKerYq9fY.	2023-05-26 20:58:05 +01:00
Tobias Hieta	f84bac329b	[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0 since I forgot the lit.local.cfg files in that one. Reformatting is done with `black`. If you end up having problems merging this commit because you have made changes to a python file, the best way to handle that is to run git checkout --ours <yourfile> and then reformat it with black. If you run into any problems, post to discourse about it and we will try to help. RFC Thread below: https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style Reviewed By: barannikov88, kwk Differential Revision: https://reviews.llvm.org/D150762	2023-05-17 17:03:15 +02:00
luxufan	f470922a29	Revert "Revert "[ValutTracking] Use isGuaranteedNotToBePoison in impliesPoison"" This reverts commit 706e8110573c83f140a63b40803d6370c86c1414.	2023-05-10 14:35:55 +08:00
Dávid Bolvanský	6321e4ddf7	[SimplifyLibCalls] Transform memchr(STR, C, N) to chain of ORs Motivation: ``` #include <string_view> size_t findFirst_ABCDEF(std::string_view sv) { return sv.find_first_of("ABCDEF"); } ``` memchr("ABCDEF", C, 6) != NULL -> (C == 'A' \|\| C == 'B' \|\| C == 'C' \|\| C == 'D' \|\| C == 'E' \|\| C == 'F') != 0 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D128011	2023-05-07 15:12:03 +02:00
ManuelJBrito	8b56da5e9f	[IR] Change shufflevector undef mask to poison With this patch an undefined mask in a shufflevector will be printed as poison. This change is done to support the new shufflevector semantics for undefined mask elements. Differential Revision: https://reviews.llvm.org/D149210	2023-04-27 14:41:10 +01:00
Shilei Tian	d4ecd1241c	Revert "[OpenMP] Introduce kernel environment" This reverts commit 35cfadfbe2decd9633560b3046fa6c17523b2fa9. It makes a couple of buildbots unhappy because of the following test failures: - `Transforms/OpenMP/add_attributes.ll'` - `mapping/declare_mapper_target_data.cpp` on AMDGPU	2023-04-22 20:56:35 -04:00
Shilei Tian	35cfadfbe2	[OpenMP] Introduce kernel environment This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime. This is a combination and refinement of patch series D116908, D116909, and D116910. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D142569	2023-04-22 20:46:38 -04:00
Dávid Bolvanský	5084abbea9	[Tests] Added test for D128011	2023-04-14 22:37:55 +02:00
Simon Pilgrim	aa754f7e0f	[IR] llvm::createMinMaxOp - create integer min/max intrinsics instead of icmp/sel Based off D148215, when expanding a min/max reduction we should be creating min/max intrinsics directly instead of relying on instcombine to fold them back together. This patch handles integer min/max cases. Hopefully we can add floating point support soon (at least for fastmath/nnan cases) - but we're missing some of the plumbing to pass the correct FMF to the intrinsic at the moment. Differential Revision: https://reviews.llvm.org/D148221	2023-04-13 16:40:43 +01:00
Jeff Byrnes	9b79d0b610	[MergedLoadStoreMotion] Merge stores with conflicting value types Since memory does not have an intrinsic type, we do not need to require value type matching on stores in order to sink them. To facilitate that, this patch finds stores which are sinkable, but have conflicting types, and bitcasts the ValueOperand so they are easily sinkable into a PHINode. Rather than doing fancy analysis to optimally insert the bitcast, we always insert right before the relevant store in the diamond branch. The assumption is that later passes (e.g. GVN, SimplifyCFG) will clean up bitcasts as needed. Differential Revision: https://reviews.llvm.org/D147348	2023-04-04 12:01:29 -07:00
Simon Pilgrim	07c5e175f6	[PhaseOrdering] Add test case for Issue #61061	2023-04-01 13:27:16 +01:00
Jeff Byrnes	7739be7c6b	[ArgPromotion] Remove dead code produced by removing dead arguments ArgPromotion currently produces phantom / dead loads. A good example of this is store-into-inself.ll. First, ArgPromo finds the promotable argument %p in @l. Then it inserts a load of %p in the caller, and passes instead the loaded value / transforms the function body. PromoteMem2Reg is able to optimize out the entire function body, resulting in an unused argument. In a subsequent ArgPromotion pass, it removes the dead argument, resulting in a dead load in the caller. These dead loads may reduce effectiveness of other transformations (e.g. SimplifyCFG, MergedLoadStoreMotion). This patch removes loads and geps that are made dead in the caller after removal of dead args. Differential Revision: https://reviews.llvm.org/D146327	2023-03-23 09:43:35 -07:00
Jeff Byrnes	08622314d2	Precommit tests for D146327	2023-03-22 12:23:28 -07:00
Nikita Popov	fb5683449e	[Pipelines] Restore old DAE position in LTO pipeline This is a partial revert of D128830, restoring the previous position of DeadArgElim in the fat LTO pipeline. The motivation for this is a major code size regression observed in Rust and illustrated in the PhaseOrdering test. This is a conservative fix restoring the previous pipeline order. The real problem is that the LTO pipeline is conceptually broken: It doesn't have a CGSCC function simplification pipeline. The inliner is just being run by itself. This wouldn't be a problem if fat LTO used a standard design where ArgPromotion and DAE are only run after functions have already been simplified by the CGSCC inliner pipeline. Differential Revision: https://reviews.llvm.org/D146051	2023-03-14 17:00:17 +01:00
Nikita Popov	8df140c860	[PhaseOrdering] Add test for DAE/GlobalDCE interaction (NFC)	2023-03-14 15:19:39 +01:00
Arthur Eubanks	0d4a709bb8	[Pipeline] Adjust PostOrderFunctionAttrs placement in simplification pipeline We can infer more attribute information once functions are fully simplified, so move the PostOrderFunctionAttrs pass after the function simplification pipeline. However, just doing this can impact simplification of recursive functions since function simplification takes advantage of function attributes of callees (some LLVM tests are actually impacted by this), so keep a copy of PostOrderFunctionAttrs before the function simplification pipeline that only runs on recursive functions. For example, this fixes the small regression noticed in https://reviews.llvm.org/D128830. This requires some restructuring of the CGSCC NoRerun feature. We need to cache the ShouldNotRunFunctionPassesAnalysis analysis after the simplification is done, which now is after the second PostOrderFunctionAttrs run, rather than after the function simplification pipeline. Compile time impact: https://llvm-compile-time-tracker.com/compare.php?from=33cf40122279342b50f92a3a53f5c185390b6018&to=1bb2a07875634e508a6bdf2ca1b130f55510f060&stat=instructions:u Compile time increase from unconditionally running the first PostOrderFunctionAttrs: https://llvm-compile-time-tracker.com/compare.php?from=1bb2a07875634e508a6bdf2ca1b130f55510f060&to=f4f87e89cc7a35c64e3a103a8036192a84ae002b&stat=instructions:u Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D145210	2023-03-06 09:01:45 -08:00
Arthur Eubanks	bd80dbf284	[test] Precommit test for D145210	2023-03-03 11:22:32 -08:00
Sanjay Patel	6c7b2eef47	[PhaseOrdering] add test for vector load and cast transforms; NFC issue #51397	2023-03-01 13:07:16 -05:00
Alexey Bataev	e03d254bbd	[SLP]Do not reduce repeated values, use scalar red ops instead. Metric: size..text size..text results results0 diff SingleSource/Regression/C/gcc-c-torture/execute/GCC-C-execute-980605-1.test 445.00 461.00 3.6% SingleSource/Benchmarks/Adobe-C++/loop_unroll.test 428477.00 428445.00 -0.0% External/SPEC/CFP2006/447.dealII/447.dealII.test 618849.00 618785.00 -0.0% For all tests some extra code was optimized, GCC-C-execute has some more inlining after Differential Revision: https://reviews.llvm.org/D132261	2023-02-17 07:19:35 -08:00
David Green	86bfeb906e	Revert "Inlining: Run the legacy AlwaysInliner before the regular inliner." This seems to cause large regressions in existing code, as much as 75% slower (4x the time taken). Small always inline functions seem to be used a lot in the cmsis-dsp library. I would add a phase ordering test to show the problems, but one already exists! The llvm/test/Transforms/PhaseOrdering/ARM/arm_mult_q15.ll was just changed by removing alwaysinline to hide the problems that existed. This reverts commit cae033dcf227aeecf58fca5af6fc7fde1fd2fb4f. This reverts commit 8e33c41e72ad42e4c27f8cbc3ad2e02b169637a1.	2023-02-10 15:01:49 +00:00
Amara Emerson	cae033dcf2	Inlining: Run the legacy AlwaysInliner before the regular inliner. We have several situations where it's beneficial for code size to ensure that every call to always-inline functions are inlined before normal inlining decisions are made. While the normal inliner runs in a "MandatoryOnly" mode to try to do this, it only does it on a per-SCC basis, rather than the whole module. Ensuring that all mandatory inlinings are done before any heuristic based decisions are made just makes sense. Despite being referred to the "legacy" AlwaysInliner pass, it's already necessary for -O0 because the CGSCC inliner is too expensive in compile time to run at -O0. This also fixes an exponential compile time blow up in https://github.com/llvm/llvm-project/issues/59126 Differential Revision: https://reviews.llvm.org/D143624	2023-02-09 16:49:29 -08:00
Florian Hahn	43acb61a08	Revert "[SCCP] Support NUW/NSW inference for all overflowing binary operators." This reverts commit 024115ab14822a97c09adcd2545c14e78b843b36. I suspect that this may be causing some buildbot bootstrapping failures. Revert while I investigate.	2023-01-28 21:33:28 +00:00
Florian Hahn	024115ab14	[SCCP] Support NUW/NSW inference for all overflowing binary operators. Extend the NUW/NSW inference logic add in 72121a20cd and cdeaf5f28c3dc to all overflowing binary operators. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D142721	2023-01-28 17:40:41 +00:00
Jamie Hill-Daniel	6b9317f52a	[InstCombine] Fold zero check followed by decrement to usub.sat Fold (a == 0) : 0 ? a - 1 into usub.sat(a, 1). Differential Revision: https://reviews.llvm.org/D140798	2023-01-09 14:22:25 +01:00
Florian Hahn	68469a80cb	[LV] Disable runtime unrolling for vectorized loops. This patch adds metadata to disable runtime unrolling to the vectorized loop. If runtime unrolling/interleaving is considered profitable, LV will interleave the loop directly. There should be no need to perform runtime unrolling at a later stage. Note that we already add metadata to disable runtime unrolling to the scalar loop after vectorization. The additional unrolling unnecessarily increases code size and compile time. In addition to that we have several bug reports of unncessary runtime unrolling for vectorized loops, e.g. PR40961 Compile-time improvements: NewPM-O3: -1.04% NewPM-ReleaseThinLTO: -0.59% NewPM-ReleaseLTO-g: -0.97% https://llvm-compile-time-tracker.com/compare.php?from=ce1be13a868d0f8afa367975558c1a6175cce33a&to=78bc2e67f22e9e10e61cdb6cdac4bb857d95eb1b&stat=instructions:u Fixes #40306. Reviewed By: lebedev.ri, nikic Differential Revision: https://reviews.llvm.org/D115261	2023-01-06 10:56:17 +00:00
Roman Lebedev	08c2f4eb7a	[CVP] When expanding `urem`, always freeze the nominator As per the post-commit feedback - that was not the correct precondition to avoid it here. I think we should generally start changing mentality about `freeze`, the fact that we have been conditioned to be afraid of it (or of anything in LLVM in general) is the key problem here.	2022-12-31 05:00:43 +03:00
Roman Lebedev	66efb98632	[CVP] Expand bound `urem`s This kind of thing happens really frequently in LLVM's very own shuffle combining methods, and it is even considered bad practice to use `%` there, instead of using this expansion directly. Though, many of the cases there have variable divisors, so this won't help everything. Simple case: https://alive2.llvm.org/ce/z/PjvYf- There's alternative expansion via `umin`: https://alive2.llvm.org/ce/z/hWCVPb BUT while we can transform the first expansion into the `umin` one (e.g. for SCEV): https://alive2.llvm.org/ce/z/iNxKmJ ... we can't go in the opposite direction. Also, the non-`umin` expansion seems somewhat more codegen-friendly: https://godbolt.org/z/qzjx5bqWK https://godbolt.org/z/a7bj1axbx There's second variant of precondition: https://alive2.llvm.org/ce/z/zE6cbM but there the numerator must be non-undef / must be frozen.	2022-12-30 19:40:46 +03:00
Roman Lebedev	3d852d1e74	[NFC][PhaseOrdering] Re-autogenerate check lines in one test	2022-12-30 19:40:46 +03:00

1 2 3 4 5 ...

493 Commits