llvm-project

Author	SHA1	Message	Date
Noah Goldstein	9ef829097b	[InstCombine] Fix buggy transform in `foldNestedSelects`; PR 71330 The bug is that `IsAndVariant` is used to assume which arm in the select the output `SelInner` should be placed but match the inner select condition with `m_c_LogicalOp`. With fully simplified ops, this works fine, but its possible if the select condition is not simplified, for it match both `LogicalAnd` and `LogicalOr` i.e `select true, true, false`. In PR71330 for example, the issue occurs in the following IR: ``` define i32 @bad() { %..i.i = select i1 false, i32 0, i32 3 %brmerge = select i1 true, i1 true, i1 false %not.cmp.i.i.not = xor i1 true, true %.mux = zext i1 %not.cmp.i.i.not to i32 %retval.0.i.i = select i1 %brmerge, i32 %.mux, i32 %..i.i ret i32 %retval.0.i.i } ``` When simplifying: ``` %retval.0.i.i = select i1 %brmerge, i32 %.mux, i32 %..i.i ``` We end up matching `%brmerge` as `LogicalAnd` for `IsAndVariant`, but the inner select (`%..i.i`) condition which is `false` with `LogicalOr`. Closes #71489	2023-11-09 16:36:49 -06:00
Nikita Popov	ed86e740ef	Revert "[SROA] Limit the number of allowed slices when trying to split allocas" This reverts commit e13e808283f7fd9e873ae922dd1ef61aeaa0eb4a. This causes performance regressions on GPU targets, see https://github.com/llvm/llvm-project/issues/69785. Revert the change for now.	2023-11-09 16:38:52 +01:00
Nikita Popov	369c9b791b	[MemCpyOpt] Require writable object during call slot optimization (#71542 ) Call slot optimization may introduce writes to the destination object that occur earlier than in the original function. We currently already check that that the destination is dereferenceable and aligned, but we do not make sure that it is writable. As such, we might introduce a write to read-only memory, or introduce a data race. Fix this by checking that the object is writable. For arguments, this is indicated by the new writable attribute. Tests using sret/dereferenceable are updated to use it.	2023-11-09 15:55:44 +01:00
Nikita Popov	1b1c81772f	[InstCombine] Drop poison flags in simplifyAssocCastAssoc() The nneg flag on zext may no longer hold after the reassociation.	2023-11-09 11:58:02 +01:00
Chuanqi Xu	b7b5907b56	[Coroutines] Introduce [[clang::coro_only_destroy_when_complete]] (#71014 ) Close https://github.com/llvm/llvm-project/issues/56980. This patch tries to introduce a light-weight optimization attribute for coroutines which are guaranteed to only be destroyed after it reached the final suspend. The rationale behind the patch is simple. See the example: ```C++ A foo() { dtor d; co_await something(); dtor d1; co_await something(); dtor d2; co_return 43; } ``` Generally the generated .destroy function may be: ```C++ void foo.destroy(foo.Frame frame) { switch(frame->suspend_index()) { case 1: frame->d.~dtor(); break; case 2: frame->d.~dtor(); frame->d1.~dtor(); break; case 3: frame->d.~dtor(); frame->d1.~dtor(); frame->d2.~dtor(); break; default: // coroutine completed or haven't started break; } frame->promise.~promise_type(); delete frame; } ``` Since the compiler need to be ready for all the cases that the coroutine may be destroyed in a valid state. However, from the user's perspective, we can understand that certain coroutine types may only be destroyed after it reached to the final suspend point. And we need a method to teach the compiler about this. Then this is the patch. After the compiler recognized that the coroutines can only be destroyed after complete, it can optimize the above example to: ```C++ void foo.destroy(foo.Frame frame) { frame->promise.~promise_type(); delete frame; } ``` I spent a lot of time experimenting and experiencing this in the downstream. The numbers are really good. In a real-world coroutine-heavy workload, the size of the build dir (including .o files) reduces 14%. And the size of final libraries (excluding the .o files) reduces 8% in Debug mode and 1% in Release mode.	2023-11-09 14:42:07 +08:00
Allen	7ec86f4d68	[SimplifyCFG] Fix the compile crash for invalid upper bound value (#71351 ) Fix the crash for the last land PR70542. Note: For '%add = add nuw i32 %x, 1', we can only infer the LowerBound is 1, but the UpperBound is wrapped to 0 in computeConstantRange. so we can't assume the UpperBound is valid bound when its value is 0. Fix https://github.com/llvm/llvm-project/issues/71329. Reviewed By: zmodem, nikic	2023-11-09 12:33:24 +08:00
Anna Thomas	29f03bf48d	[GuardWidening] Require analyses only if necessary We need to request analyses needed for guard widening only if there are guards/widenable conditions.	2023-11-08 11:54:10 -05:00
Jeremy Morse	f1b0a54451	Reapply 7d77bbef4ad92, adding new debug-info classes This reverts commit 957efa4ce4f0391147cec62746e997226ee2b836. Original commit message below -- in this follow up, I've shifted un-necessary inclusions of DebugProgramInstruction.h into being forward declarations (fixes clang-compile time I hope), and a memory leak in the DebugInfoTest.cpp IR unittests. I also tracked a compile-time regression in D154080, more explanation there, but the result of which is hiding some of the changes behind the EXPERIMENTAL_DEBUGINFO_ITERATORS compile-time flag. This is tested by the "new-debug-iterators" buildbot. [DebugInfo][RemoveDIs] Add prototype storage classes for "new" debug-info This patch adds a variety of classes needed to record variable location debug-info without using the existing intrinsic approach, see the rationale at [0]. The two added files and corresponding unit tests are the majority of the plumbing required for this, but at this point isn't accessible from the rest of LLVM as we need to stage it into the repo gently. An overview is that classes are added for recording variable information attached to Real (TM) instructions, in the form of DPValues and DPMarker objects. The metadata-uses of DPValues is plumbed into the metadata hierachy, and a field added to class Instruction, which are all stimulated in the unit tests. The next few patches in this series add utilities to convert to/from this new debug-info format and add instruction/block utilities to have debug-info automatically updated in the background when various operations occur. This patch was reviewed in Phab in D153990 and D154080, I've squashed them together into this commit as there are dependencies between the two patches, and there's little profit in landing them separately. [0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939	2023-11-08 16:42:35 +00:00
Nikita Popov	2c61f9cab5	[CVP] Fix use after scope Store the result of ConstantRange::sdiv() in a variable, as getSingleElement() will return a pointer to the APInt it contains.	2023-11-08 16:53:47 +01:00
Florian Hahn	26ab444e88	[ConstraintElim] Make sure add-rec is for the current loop. Update addInfoForInductions to also check if the add-rec is for the current loop. Otherwise we might add incorrect facts or crash. Fixes a miscompile & crash introduced by 00396e6a1a0b.	2023-11-08 14:07:28 +00:00
Nikita Popov	d687057de8	[CVP] Try to fold sdiv to constant If we know that the sdiv result is a single constant, directly use that instead of performing narrowing. Fixes https://github.com/llvm/llvm-project/issues/71659.	2023-11-08 14:49:24 +01:00
Markos Horro	9d2903c8e5	[IndVars] Add check of loop invariant for trunc instructions (#71072 ) The same idea as in 34d380e1f63a7e2cdb9ab1e6498f727fcd710a14, but considering truncation instructions. Improvement for #59633.	2023-11-08 11:16:23 +00:00
Nikita Popov	567c02a80e	[InstCombine] Remove inttoptr/ptrtoint handling from indexed compare fold Looking through inttoptr / ptrtoint intermixed with GEPs is very questionable from a provenance perspective. We also don't seem to have any test coverage that shows this is useful (apart from one test I added to guard against a crash).	2023-11-08 11:13:57 +01:00
Nikita Popov	5918f62301	[InstCombine] Infer zext nneg flag (#71534 ) Use KnownBits to infer the nneg flag on zext instructions. Currently we only set nneg when converting sext -> zext, but don't set it when we have a zext in the first place. If we want to use it in optimizations, we should make sure the flag inference is consistent.	2023-11-08 09:34:40 +01:00
Vladislav Dzhidzhoev	6beddd668a	Revert "[DebugMetadata][DwarfDebug] Support function-local types in lexical block scopes (4/7)" This caused assert: llvm/llvm/lib/CodeGen/AsmPrinter/DwarfFile.cpp:110: void llvm::DwarfFile::addScopeVariable(LexicalScope , DbgVariable ): Assertion `Ret.second' failed. See comments https://reviews.llvm.org/D144006#4656350. This reverts commit 3b449bd46a11a55a40cbc0016a99b202fa05248e.	2023-11-08 00:29:24 +01:00
Antonio Frighetto	7d39838948	[InstCombine] Favour `CreateZExtOrTrunc` in `narrowFunnelShift` (NFC) Use `CreateZExtOrTrunc`, reduce test and regenerate checks.	2023-11-07 22:48:14 +01:00
Paulo Matos	7b9d73c2f9	[NFC] Remove Type::getInt8PtrTy (#71029 ) Replace this with PointerType::getUnqual(). Followup to the opaque pointer transition. Fixes an in-code TODO item.	2023-11-07 17:26:26 +01:00
Philip Reames	551c280cfd	[indvars] Always fallback to truncation if AddRec widening fails (#70967 ) The current code structure results in cases where if a) we can't clone the IV user (because it's not in our whitelist) or b) can't prove the SCEV expressions are identical, we'd sometimes leave both the original unwiddened IV and the partially widdened IV in code. Instead, just truncate thw wide IV to the use - same as what we'd do if we couldn't find an addrec to start with. Noticed this while playing with changing how we produce addrecs. The current structure results in a very tight interlock between SCEVs internal capabilities and indvars code.	2023-11-07 07:49:39 -08:00
Antonio Frighetto	caa124b58d	[InstCombine] Zero-extend shift amounts in narrow funnel shift ops An issue arose when handling shift amounts while performing narrowed funnel shifts simplification. Specifically, shift amounts were incorrectly truncated when their type was narrower than the target bit width. This has been addressed by zero-extending `ShAmt` in such cases. Fixes: https://github.com/llvm/llvm-project/issues/71463. Proof: https://alive2.llvm.org/ce/z/5draKz.	2023-11-07 14:15:32 +01:00
Nikita Popov	6e56c35d19	[SpeculativeExecution] Add only-if-divergent-target pass option The optimization pipeline enables this option, but it was not preserved in -print-pipeline-passes output.	2023-11-07 11:49:37 +01:00
Hans Wennborg	05ed92127c	Revert "Reland [SimplifyCFG] Delete the unnecessary range check for small mask operation (#70542 )" This caused https://github.com/llvm/llvm-project/issues/71329 > Fix the compile crash when the default result has no result for > https://github.com/llvm/llvm-project/pull/65835 > > Fixes https://github.com/llvm/llvm-project/issues/65120 > Reviewed By: zmodem, nikic This reverts commit 7c4180a36a905b7ed46c09df77af1b65e356f92a.	2023-11-07 10:53:22 +01:00
Nikita Popov	e360a16fee	[GlobalOpt] Cache whether CC is changeable (#71381 ) The hasAddressTaken() call in hasOnlyColdCalls() has quadratic complexity if there are many cold calls to a function: We're going to visit each call of the function, and then for each of them iterate all the users of the function. We've recently encountered a case where GlobalOpt spends more than an hour in these hasAddressTaken() checks when full LTO is used. Avoid this by moving the hasAddressTaken() check into hasChangeableCC() and caching its result, so it is only computed once per function.	2023-11-07 10:36:45 +01:00
Allen	a0cd6265bc	[InstCombine] Split the FMul with reassoc into a helper function, NFC (#71493 ) The reassoc check is really hard to find because the handle branch it too large, so spilt it into a helper function.	2023-11-07 15:30:56 +08:00
Philip Reames	23099ac239	Add known and demanded bits support for zext nneg (#70858 ) zext nneg was recently added to the IR in #67982. This patch teaches demanded bits and known bits about the semantics of the instruction, and adds a couple of test cases to illustrate basic functionality.	2023-11-06 18:47:56 -08:00
LiqinWeng	5d3d08463d	[InstCombinePHI] Remove dead PHI on UnaryOperator (#71386 ) This patch mainly solves the problem of dead PHI on UnaryOperator	2023-11-07 09:45:33 +08:00
Tom Stellard	2400c54c37	[Vectorize] Remove Transforms/Vectorize.h (#71294 ) The only thing in this file is a declaration for createLoadStoreVectorizerPass(), and this function is already declared in LoadStoreVectorizer.h.	2023-11-06 14:04:22 -08:00
Simon Pilgrim	3ca4fe80d4	[Transforms] Use StringRef::starts_with/ends_with instead of startswith/endswith. NFC. startswith/endswith wrap starts_with/ends_with and will eventually go away (to more closely match string_view)	2023-11-06 16:50:18 +00:00
Florian Hahn	a002271972	[VPlan] Add VPValue::replaceUsesWithIf (NFCI). Add replaceUsesWithIf helper and use it in a few places.	2023-11-06 16:08:22 +00:00
Nikita Popov	c4c0ac10f1	[IPO] Remove unnecessary bitcasts (NFC)	2023-11-06 16:49:45 +01:00
Alexey Bataev	ac254fc055	[SLP]Improve tryToGatherExtractElements by using per-register analysis. Currently tryToGatherExtractElements function analyzes the whole vector, regrdless number of actual registers, used in this vector. It may prevent some optimizations, because per-register analysis may allow to simplify the final code by reusing more already emitted vectors and better shuffles. Differential Revision: https://reviews.llvm.org/D148855	2023-11-06 07:29:27 -08:00
Nikita Popov	be3cef0b2a	[LibCallsShrinkWrap] Avoid use of ConstantExpr::getFPExtend() (NFC) Use the constant folding API instead.	2023-11-06 15:38:42 +01:00
Nikita Popov	16a595e398	[Attributor] Avoid use of ConstantExpr::getFPTrunc() (NFC) Use the constant folding API instead. For simplificity I'm using the DL-independent API here.	2023-11-06 15:27:01 +01:00
Nikita Popov	25af06fd7a	[InstCombine] Avoid use of FP cast constant expressions (NFC) Use the constant folding API instead. As we're working on plain ConstantFP, this should always succeed.	2023-11-06 15:22:33 +01:00
Nikita Popov	abc27bd31f	[InstCombine] Avoid some FP cast constant expressions (NFCI) Instead of doing fptoxi and xitofp casts to check for round-trip, directly check the IsExact flag on the convertToInteger() API.	2023-11-06 14:42:42 +01:00
Hans Wennborg	046c57e705	Revert "[SLP]Improve tryToGatherExtractElements by using per-register analysis." This causes asserts: llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:10082: Value llvm::slpvectorizer::BoUpSLP::ShuffleInstructionBuilder::adjustExtracts( const TreeEntry , MutableArrayRef<int>, unsigned int, bool &): Assertion `Part == 0 && "Expected firs part."' failed. See comment on the code review. > Currently tryToGatherExtractElements function analyzes the whole vector, > regrdless number of actual registers, used in this vector. It may > prevent some optimizations, because per-register analysis may allow to > simplify the final code by reusing more already emitted vectors and > better shuffles. > > Differential Revision: https://reviews.llvm.org/D148855 This reverts commit 9dfdbd788707edc8c39eb2bff16004aba1f3586b.	2023-11-06 13:56:42 +01:00
Dominik Adamski	2cce0f6c57	[OpenMP][OMPIRBuilder] Add support to omp target parallel (#67000 ) Added support for LLVM IR code generation which is used for handling omp target parallel code. The call for __kmpc_parallel_51 is generated and the parallel region is outlined to separate function. The proper setup of kmpc_target_init mode is not included in the commit. It is assumed that the SPMD mode for target initialization is properly set by other codegen functions.	2023-11-06 11:44:00 +01:00
Noah Goldstein	ad9147399f	[InstCombine] Improve eq/ne by parts to handle `ult/ugt` equality pattern. (icmp eq/ne (lshr x, C), (lshr y, C) gets optimized to `(icmp ult/uge (xor x, y), (1 << C)`. This can cause the current equal by parts detection to miss the high-bits as it may get optimized to the new pattern. This commit adds support for detecting / combining the ult/ugt pattern. Closes #69884	2023-11-04 19:00:28 -05:00
Teresa Johnson	87f5e22987	[MemProf] Tolerate missing leaf debug frames (#71233 ) Loosen up the matching so that a missing leaf debug frame in the profile does not prevent matching an allocation context if we can match further up the inlined call context. This relies on the pre-inliner, which was already the default when performing normal PGO feedback along with the MemProf feedback, but to ensure matching is not affected by the presence of PGO, enable the pre-inliner for MemProf feedback as well.	2023-11-03 21:01:07 -07:00
Nikita Popov	a682a9cfd0	Revert "Port Swift's merge function pass to llvm: merging functions that differ in constants (#68235 )" This reverts commit 19b5495b653a00da7a250f48b4f739fcf2bbe82f. PR landed without approval, with severe quality issues.	2023-11-03 21:15:46 +01:00
XChy	c880fdc0f0	[DFAJumpThreading] Remove incoming StartBlock from all phis when unfolding select (#71082 ) Fixes #65222. When unfolding select into diamond-like control flow, we need to remove the StartBlock from all phis in EndBlock.	2023-11-04 03:32:20 +08:00
Philip Reames	5adf6ab7ff	Revert "[IndVars] Generate zext nneg when locally obvious" This reverts commit a6c8e27b3a052913a15a13ee0d4ac466c5ab3f92. It appears likely to have caused https://lab.llvm.org/buildbot/#/builders/57/builds/30988.	2023-11-03 11:19:14 -07:00
Manman Ren	19b5495b65	Port Swift's merge function pass to llvm: merging functions that differ in constants (#68235 ) See RFC for details: https://discourse.llvm.org/t/rfc-for-moving-swift-s-merge-function-pass-to-llvm/73778 We will need to refactor extension to FunctionComparator/FunctionHash to StructuralHash. This patch adds a new pass which is ported from Swift, and will need to discuss on how to migrate Swift’s pass over after we land this in llvm. Create this PR to get some early review on the patch. --------- Co-authored-by: Manman Ren <mren@meta.com>	2023-11-03 11:13:58 -07:00
Philip Reames	7c93452e17	[indvars] Restructure getExtendedOperandRecurrence [nfc] As suggested during review of https://github.com/llvm/llvm-project/pull/70990.	2023-11-03 10:50:57 -07:00
Alexey Bataev	9dfdbd7887	[SLP]Improve tryToGatherExtractElements by using per-register analysis. Currently tryToGatherExtractElements function analyzes the whole vector, regrdless number of actual registers, used in this vector. It may prevent some optimizations, because per-register analysis may allow to simplify the final code by reusing more already emitted vectors and better shuffles. Differential Revision: https://reviews.llvm.org/D148855	2023-11-03 10:43:58 -07:00
Johannes Doerfert	d3e7a48cbd	[OpenMP][NFC] Remove a no-op function	2023-11-03 10:28:36 -07:00
Philip Reames	1ffea97ffd	[indvars] Support known positive extends in getExtendedOperandRecurrence (#70990 ) IndVars has the existing notion of a narrow definition which is known to positive and thus both sign and zero extension kinds are actually the same operations. There's existing logic for forming a SCEV based on the extension kind and the no-wrap flags. This change extends that logic to form the opposite extension kind for a positive def if doing so is allowed by the flags. Note that we already do something analogous for the getWideRecurrence case as well.	2023-11-03 10:21:30 -07:00
Ellis Hoag	890335bb28	[InstrProf] Do not block functions from PGOUse (#71106 ) The `skipPGO()` function was added in https://reviews.llvm.org/D137184. Unfortunately, it also blocked functions from being annotated (PGOUse), which I believe will cause confusion to users if a function has a profile but it is not PGO'd. The docs for `noprofile` and `skipprofile` only claim to block instrumentation, not PGO optimization: https://llvm.org/docs/LangRef.html	2023-11-03 09:41:26 -07:00
Philip Reames	a6c8e27b3a	[IndVars] Generate zext nneg when locally obvious zext nneg was recently added to the IR in #67982. This patch teaches SimplifyIndVars to prefer zext nneg over both sext and plain zext, when a local SCEV query indicates the source is non-negative. The choice to prefer zext nneg over sext looks slightly aggressive here, but probably isn't so much in practice. For cases where we'd "remember" the range fact, instcombine would convert the sext into a zext nneg anyways. The only cases where this produces a different result overall are when SCEV knows a non-local fact, and it doesn't get materialized into the IR. Those are exactly the cases where using zext nneg are most useful. We do run the risk of e.g. a missing combine - since we haven't updated most of them yet - but that seems like a manageable risk. Note that there are much deeper algorithmic changes we could make to this code to exploit zext nneg, but this seemed like a reasonable and low risk starting point.	2023-11-03 09:20:59 -07:00
Nikita Popov	5c3beb7b1e	[MemCpyOpt] Handle memcpy marked as memory(none) Fixes #71183.	2023-11-03 15:20:21 +01:00
Nikita Popov	03110ddeb2	[IR] Remove ZExtOperator (NFC) Now that zext constant expressions are no longer supported, ZExtInst should be used instead.	2023-11-03 14:52:59 +01:00

1 2 3 4 5 ...

35023 Commits