llvm-project

Author	SHA1	Message	Date
Yingwei Zheng	6d667d4b26	[InstCombine] Combine const GEP chains This patch reverts rGae739aefd7473517d3f08b5c8d08a66c7f469198 to address performance regressions reported by our [CI](https://github.com/dtcxzyw/llvm-ci/issues/137) after rG2ec1d0f427c7822540352c0c14d057e7bfe4f77b. For example: ``` define ptr @const_gep_chain(ptr %p, i64 %a) { %p1 = getelementptr inbounds i8, ptr %p, i64 %a %p2 = getelementptr inbounds i8, ptr %p1, i64 1 %p3 = getelementptr inbounds i8, ptr %p2, i64 2 %p4 = getelementptr inbounds i8, ptr %p3, i64 3 ret ptr %p4 } ``` The last three GEPs will not be folded since rG2ec1d0f427c7822540352c0c14d057e7bfe4f77b. I think it is appropriate to remove this code because there is no compile-time regression reported in our benchmarks. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D149240	2023-05-02 00:28:39 +08:00
Florian Hahn	6303fa369c	[VPlan] Remove DeadInsts arg from VPInstructionsToVPRecipes (NFC) The argument isn't used. VPlan-based dead recipe removal can be used instead.	2023-05-01 15:03:29 +01:00
Vitaly Buka	e8893133d1	Revert "[NFC][HWASAN] Handle tags as Int8" More tests need updates. This reverts commit e876ba5db98830db427395ed9b3718d20bf519fb.	2023-04-30 20:59:43 -07:00
Vitaly Buka	e876ba5db9	[NFC][HWASAN] Handle tags as Int8	2023-04-30 19:58:01 -07:00
Vitaly Buka	0b97aff4d2	[NFC][HWASAN] Rename local variable	2023-04-30 19:49:25 -07:00
Vitaly Buka	f42f863c33	[NFC][HWASAN] Set constant type from another operand	2023-04-30 19:07:57 -07:00
Vitaly Buka	37f6c9f852	[HWASAN] Untag before tagging alloca pointers This is folloup to b5595836, which missed the Replacemen variable. Before b5595836 the code assumed that alloca ptrs are not tagged so tagging is implemented as simple OR. So this patch completes support of tagged SP by passing untagged alloca pointers into tagPointer.	2023-04-30 18:26:58 -07:00
Valentin Churavy	bf08973277	Don't loop unswitch vector selects Otherwise we could produce `br <2x i1>` which are of course not legal. ``` Branch condition is not 'i1' type! br <2 x i1> %cond.fr1, label %entry.split.us, label %entry.split %cond.fr1 = freeze <2 x i1> %cond LLVM ERROR: Broken module found, compilation aborted! PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace. Stack dump: 0. Program arguments: /home/vchuravy/builds/llvm/bin/opt -passes=simple-loop-unswitch<nontrivial> -S ``` Fixes change introduced by https://reviews.llvm.org/D138526 Reviewed By: caojoshua Differential Revision: https://reviews.llvm.org/D149560	2023-04-30 19:19:29 -04:00
Florian Hahn	0b24436591	[LV] Clarify comment for selectVectorizationFactor (NFC). The comment is stale, as UserVF is handled before selectVectorizationFactor is called. Clarify the comment by remove the mention of UserVF. Suggested as independent improvement in D143938.	2023-04-30 21:12:15 +01:00
Florian Hahn	a431402fd2	[LV] Remove loop arg from CM::isCandidateForEpilogueVectorization (NFC) LVP operates on the loop it stores in TheLoop. Use it instead of the argument, to be in line with other member functions. Suggested as independent improvement in D143938.	2023-04-30 21:11:12 +01:00
Florian Hahn	6fa07a87ab	[LV] Document selectEpilogueVectorizationFactor (NFC). Add missing documentation for selectEpilogueVectorizationFactor. Suggested as independent improvement in D143938.	2023-04-30 21:09:24 +01:00
Florian Hahn	9fce1fc6f8	[LVP] Fix comment for hasPlanWithVF (NFC). The function checks if there's a plan with the specified VF. Update the comment to match the implementation. Pointed out as independent improvement in D143938.	2023-04-30 19:13:53 +01:00
Florian Hahn	8d3ff24e11	[LV] Sink collect* calls to LVP::plan() (NFC). Move calls of collect* helpers closer to where the cost-model is used. Should help simplifying D142669 & D142670. Differential Revision: https://reviews.llvm.org/D142674	2023-04-30 11:41:22 +01:00
Nuno Lopes	8a1373d308	Revert "[InstCombine] Generate better code for std::bit_floor from libstdc++" This reverts commit d775fc390d3c78cc81872e276c4b1314f19af577. The patch is wrong wrt undef and the author didn't fix it after 2 weeks.	2023-04-30 09:56:34 +01:00
Joshua Cao	e479ed90b5	[SimpleLoopUnswitch] unswitch selects The old LoopUnswitch pass unswitched selects, but the changes were never ported to the new SimpleLoopUnswitch. We unswitch by turning: ``` S = select %cond, %a, %b ``` into: ``` head: br %cond, label %then, label %tail then: br label %tail tail: S = phi [ %a, %then ], [ %b, %head ] ``` Unswitch selects are always nontrivial, since the successors do not exit the loop and the loop body always needs to be cloned. Differential Revision: https://reviews.llvm.org/D138526 Co-authored-by: Sergey Kachkov <sergey.kachkov@syntacore.com>	2023-04-29 21:24:26 -07:00
Vitaly Buka	d3c37e2cd1	[NFC][HWASAN] Use pointercast instead of bitcast	2023-04-29 17:51:19 -07:00
Vitaly Buka	a1cca2e2d1	[NFC][HWASAN] Add cont to parameter	2023-04-29 17:51:19 -07:00
Vitaly Buka	2db925659e	[NFC][HWASAN] Fix comment	2023-04-29 17:51:19 -07:00
Vitaly Buka	87d473af69	[NFC][HWASAN] Remove unused parameter	2023-04-29 17:51:18 -07:00
Noah Goldstein	dc13624e88	[InstCombine] Fold `(cmp eq/ne (umax X, Y),0)` -> `(cmp eq/ne (or X, Y),0)` `or` is almost always preferable. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D149426	2023-04-29 12:38:43 -05:00
Noah Goldstein	ecad53c3f4	[InstCombine] Don't fold `uadd.sat` to `or` if it increase instruction count In the `(cmp eq/ne (uadd.sat X, Y),0)` case, we where missing a `hasOneUse` check. Differential Revision: https://reviews.llvm.org/D149425	2023-04-29 12:38:41 -05:00
Matt Arsenault	b52db60cbb	GlobalOpt: Drop code to handle typed pointers Fixes assert with pointers with different address spaces. We could keep looking through addrspacecast, but it would require checking for null handling of the access address space. Fixes #62384	2023-04-29 09:48:21 -04:00
Vitaly Buka	67caff6f32	[msan] Improve handling of Intrinsic::is_fpclass after c55fffe c55fffe replaced fcmp with fpclass. ``` declare i1 @llvm.is.fpclass(<fptype> <op>, i32 <test>) declare <N x i1> @llvm.is.fpclass(<vector-fptype> <op>, i32 <test>) ``` Perfect fix will require checking bits of <op> corresponding to <test> argument. For now just propagate shadow without reporting before intrinsic. Still existing handling of fcmp is also simple OR, so it's not making it worse. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D149491	2023-04-28 16:27:31 -07:00
wlei	ba3cbc7aad	fix a use-after-free failure	2023-04-28 15:55:51 -07:00
AdityaK	1ce2015f7e	[NFC] check for UnreachableInst first as it is cheaper compared to getTerminatingDeoptimizeCall Reviewers: craig.topper, aeubanks, Peter Differential Revision: https://reviews.llvm.org/D134490	2023-04-28 15:29:44 -07:00
Florian Hahn	4583d7ef7c	[LV] Rename Preheader -> VecPreheader (NFC). Clarify variable name as suggested in D147964 to reduce diff.	2023-04-28 22:15:47 +01:00
Teresa Johnson	39f7b48671	[MemProf] Use updated version of hot/cold operator new Switch to the just updated versions of the API in tcmalloc that change the name of the hot cold paramter to a reserved identifier __hot_cold_t. This was based on feedback from Richard Smith, as I also need to add some follow-on handling to clang so they are annotated properly. Differential Revision: https://reviews.llvm.org/D149475	2023-04-28 13:35:46 -07:00
wlei	892daede72	[SamplePGO] Stale profile matching(part 2) Part 2 of https://reviews.llvm.org/D147456 Use callee name on IR as an anchor to match the call target/inlinee name in the profile. The advantages of this in particular: - Different from the traditional way of encoding hash signatures to every block that would affect binary/profile size and build speed, it doesn't require any additional information for this, all the data is already in the IR and profiles. - Effective for current nested profile layout in which once a callsite is mismatched all the inlinee's profiles are dropped. The input of the algorithm: - IR locations: the anchor is the callee name of direct callsite. - Profile locations: the anchor is the call target name for `BodySample`s or inlinee's profile name for `CallsiteSamples`. The two lists are populated by parsing the IR and profile and both can be generalized as a sequence of locations with an optional anchor. For example: say location `1.2(foo)` refers to a callsite at `1.2` with callee name `foo` and `1.3` refers to a non-directcall location `1.3`. ``` // The current build source code: int main() { 1. ... 2. foo(); 3. ... 4 ... 5. ... 6. bar(); 7. ... } ``` IR locations are populated and simplified as: `[1, 2(foo), 3, 5, 6(bar), 7]`. ``` ; The "stale" profile: main:350:1 1: 1 2: 3 3: 100 foo:100 4: 2 7: 2 8: 200 bar:200 9: 30 ``` Profile locations are populated and simplified as `[1, 2, 3(foo), 4, 7, 8(bar), 9]` Matching heuristic: - Match all the anchors in lexical order first. - Match non-anchors evenly between two anchors: Split the non-anchor range, the first half is matched based on the start anchor, the second half is matched based on the end anchor. So the example above is matched like: ``` [1, 2(foo), 3, 5, 6(bar), 7] \| \| \| \| \| \| [1, 2, 3(foo), 4, 7, 8(bar), 9] ``` 3 -> 4 matching is based on anchor `foo`, 5 -> 7 matching is based on anchor `bar`. The output mapping of matching is [2->3, 3->4, 5->7, 6->8, 7->9]. For the implementation, the anchors are saved in a map for fast look-up. The result mapping is saved into `IRToProfileLocationMap`(see https://reviews.llvm.org/D147456) and distributed to all FunctionSamples(`distributeIRToProfileLocationMap`) Clang-self build benchmark: Current build version: clang-10 The profiled version: clang-9 Results compared to a refresh profile(collected profile on clang-10) and to be fair, we invalidated new functions' profiles(both refresh and stale profile use the same profile list). 1) Regression to using refresh profile with this off : -3.93% 2) Regression to using refresh profile with this on : -1.1% So this algorithm can recover ~72% of the regression. Internal(Meta) large-scale services. we saw one real instance of a 3 week stale profile., it delivered a ~1.8% win. Notes or future work: - Classic AutoFDO support: the current version only supports pseudo-probe, but I believe it's not hard to extend to classic line-number based AutoFDO since pseudo-probe and line-number are shared the LineLocation structure. - The fuzzy matching is an open-ended area and there could be more heuristics to try out, but since the current version already recovers a reasonable percentage of regression(with some pseudo probe order change, it can recover close to 90%), I'm submitting the patch for review and we will try more heuristics in future. - Profile call target name are only available when the call is hit by samples, the missing anchor might mislead the matching, this can be mitigated in llvm-profgen to generate the call target for the zero samples. - This doesn't handle function name mismatch, we plan to solve it in future. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D147545	2023-04-28 13:07:32 -07:00
wlei	a98d6a11ea	[SamplePGO] Stale profile matching(part 1) AutoFDO/CSSPGO often has to deal with stale profiles collected on binaries built from several revisions behind release. It’s likely to get incorrect profile annotations using the stale profile, which results in unstable or low performing binaries. Currently for source location based profile, once a code change causes a profile mismatch, all the locations afterward are mismatched, the affected samples or inlining info are lost. If we can provide a matching framework to reuse parts of the mismatched profile - aka incremental PGO, it will make PGO more stable, also increase the optimization coverage and boost the performance of binary. This patch is the part 1 of stale profile matching, summary of the implementation: - Added a structure for the matching result:`LocToLocMap`, which is a location to location map meaning the location of current build is matched to the location of the previous build(to be used to query the “stale” profile). - In order to use the matching results for sample query, we need to pass them to all the location queries. For code cleanliness, we added a new pointer field(`IRToProfileLocationMap`) to `FunctionSamples`. - Added a wrapper(`mapIRLocToProfileLoc`) for the query to the location, the location from input IR will be remapped to the matched profile location. - Added a new switch `--salvage-stale-profile`. - Some refactoring for the staleness detection. Test case is in part 2 with the matching algorithm. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D147456	2023-04-28 13:07:32 -07:00
Vasileios Porpodas	fe1e50cd15	[NFC][SLP] Cleanup: Replace Value* operand with Instruction* in `vectorizeRootInstruction()` and `vectorizeHorReduction()` This makes it explicit that these functions work with instructions, and avoids calling them if the operand is not an instruction. Differential Revision: https://reviews.llvm.org/D149465	2023-04-28 12:35:09 -07:00
Vasileios Porpodas	92b2a266e9	[NFC][SLP] Cleanup: Moves code that changes the reduction root into a separate function. This makes `matchAssociativeReduction()` a bit simpler. Differential Revision: https://reviews.llvm.org/D149452	2023-04-28 10:05:32 -07:00
Jay Foad	56af0e913c	[EarlyCSE] Do not CSE convergent calls in different basic blocks "convergent" is documented as meaning that the call cannot be made control-dependent on more values, but in practice we also require that it cannot be made control-dependent on fewer values, e.g. it cannot be hoisted out of the body of an "if" statement. In code like this, if we allow CSE to combine the two calls: x = convergent_call(); if (cond) { y = convergent_call(); use y; } then we get this: x = convergent_call(); if (cond) { use x; } This is conceptually equivalent to moving the second call out of the body of the "if", up to the location of the first call, so it should be disallowed. Differential Revision: https://reviews.llvm.org/D149348	2023-04-28 14:50:48 +01:00
Nikita Popov	0659000ff7	[LICM] Don't duplicate instructions just because they're free D37076 makes LICM duplicate instructions into exit blocks if the instruction is free. For GEPs, the motivation appears to be that this allows the GEP to be folded into addressing modes, while non-foldable users outside the loop might prevent this. TBH I don't think LICM is the place to do this (why doesn't CGP apply this heuristic itself?) but at least I understand the motivation. However, the transform is also applied to all other "free" instructions, which are just that (removed during lowering and not "folded" in some way). For such instructions, this transform seems somewhere between useless, counter-productive (undoing CSE/GVN) and actively incorrect. For example, this transform can duplicate freeze instructions, which is illegal. This patch limits the transform to just foldable GEPs, though we might want to drop it from LICM entirely as a followup. This is a small compile-time improvement, because querying TTI cost model for every single instruction is expensive. Differential Revision: https://reviews.llvm.org/D149136	2023-04-28 14:31:23 +02:00
Florian Hahn	2c9d21a2a3	[VPlan] Turn Plan entry node into VPBasicBlock (NFCI). The entry to the plan is the preheader of the vector loop and guaranteed to be a VPBasicBlock. Make sure this is the case by adjusting the type. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D149005	2023-04-28 12:29:06 +01:00
Bjorn Pettersson	cf59f649ff	Re-apply "[Passes] Remove legacy PM versions of InstructionNamer and MetaRenamer" A new attempt after removing uses of -instnamer in polly lit tests in D148530.	2023-04-28 13:18:45 +02:00
Nikita Popov	0aed0dbec2	[LCSSA] Don't invalidate entire loop in SCEV We already invalidate each individual instruction for which LCSSA is formed in formLCSSAForInstructions(), so I don't see a reason why we would need to invalidate the entire loop on top of that. I believe we also no longer need the instruction-level invalidation now that SCEV looks through LCSSA phis, but I'll leave that for a separate patch, as it's less obvious. Differential Revision: https://reviews.llvm.org/D149331	2023-04-28 12:17:26 +02:00
Jay Foad	31ec0a6845	[SimplifyCFG] Improve the way hoisting skips over non-matching instructions D129370 introduced the idea that hoisting could skip over non-matching instructions and continue to look for matching (hoistable) instructions, but certain types of mismatch still aborted the whole hoisting attempt. Fix this by splitting out some of the instruction matching checks into a helper function. Also forbid hoisting allocas past stacksave/stackrestore, completing the fix started in D133730, to avoid regressing tests. Differential Revision: https://reviews.llvm.org/D149365	2023-04-28 10:03:32 +01:00
Alexey Bataev	8bacd75125	[SLP][NFC]Fix a warning because of the missing parens, NFC.	2023-04-27 16:59:37 -07:00
Mingming Liu	b3cb950cf3	[PGO]Implement metadata combine for 'branch_weights' of direct callsites when none of the instructions folds the rest away. - Merge cases are added for simplify-cfg {sink,hoist}, based on https://gcc.godbolt.org/z/avGvc38W7 and https://gcc.godbolt.org/z/dbWbjGhaE - When one instruction folds the others in, do not update branch_weights with sum (see test/Transforms/GVN/calls-readonly.ll) Differential Revision: https://reviews.llvm.org/D148877	2023-04-27 13:04:17 -07:00
Christian Ulmann	c67079f1be	[PGO] Fix dead StringRef access This commit fixes a dead StringRef access introduced in https://reviews.llvm.org/D149324	2023-04-27 19:42:56 +00:00
Mircea Trofin	460ea85014	[nfc][thinlto] Handle global constant importing separately This makes the logic for referenced globals reusable for import criteria that don't use thresholds - in fact, we currently didn't consider any thresholds when importing. Differential Revision: https://reviews.llvm.org/D149298	2023-04-27 12:21:50 -07:00
Arthur Eubanks	3db8ae1f68	Revert "[MergeICmps] Adapt to non-eq comparisons, bugfix" This reverts commit ca94b02e559242e6d1fcdd65320334438be69448. Causes miscompiles, see D141188	2023-04-27 11:46:36 -07:00
Alexey Bataev	1604a100f1	[SLP][NFC]Avoid extra useless ConstantVector creation, use PointerUnion instead, NFC. Better to use PointerUnion<Value , const TreeEntry > instead of extra attempts of creating null vector values, where possible.	2023-04-27 10:48:14 -07:00
ManuelJBrito	d22edb9794	[IR][NFC] Change UndefMaskElem to PoisonMaskElem Following the change in shufflevector semantics, poison will be used to represent undefined elements in shufflevector masks. Differential Revision: https://reviews.llvm.org/D149256	2023-04-27 18:01:54 +01:00
Alexey Bataev	cf792f664a	[SLP]Fix a crash for the replaced vectorized value. If two nodes share the same value, which is replaced in one of the nodes, need to automatically replace same value in all nodes. Btter to use WeakTrackingVH for this to fix compiler crash.	2023-04-27 09:32:00 -07:00
Christian Ulmann	d8e15dc4ae	[PGO] Minor instrumentation code cleanup (NFC) This commit cleans up some parts of the PGO instrumentation. Most importantly, it removes a template parameter shadowing of a class name that could lead to confusion. Reviewed By: gysit Differential Revision: https://reviews.llvm.org/D149324	2023-04-27 16:10:10 +00:00
Jay Foad	ee88cd82a9	[SimplifyCFG] Remove some unnecessary TTI arguments. NFC. TTI was already available in the SimplifyCFGOpt class.	2023-04-27 17:05:14 +01:00
Christian Ulmann	a8dd375cbf	[PGO] Move CFGMST.h into the include directory This commit moves the CFGMST.h file into the include directory. The implemented algorithm is can be helpful for downstream projects that want to use the PGO data in a non-standard way. Reviewed By: gysit Differential Revision: https://reviews.llvm.org/D149336	2023-04-27 14:11:04 +00:00
Zhongyunde	90d30fde12	[InstCombine] Add frozen for the condition value of SelectInst If the condition value of SelectInst may be a poison or undef value, infer constant range at SelectInst use is incorrect, similar to D143883. Fixes https://github.com/llvm/llvm-project/issues/62401 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D149339	2023-04-27 21:35:54 +08:00
OCHyams	bd1109307a	[DebugInfo][InstCombine] Fix missing source and variable locations after foldOpIntoPhi Reviewed By: fdeazeve Differential Revision: https://reviews.llvm.org/D149335	2023-04-27 13:56:09 +01:00

1 2 3 4 5 ...

33580 Commits