llvm-project

Author	SHA1	Message	Date
zhongyunde	df19d87227	[LV] Add option to tune the cost model, NFC For Neon, the default nonconst stride cost is conservative, and it is a local variable, which is not convenience to to tune the loop vectorize. So I try to use a option, which is similar to SVEGatherOverhead brought in D115143. Fix https://github.com/llvm/llvm-project/issues/63082. Reviewed By: dmgreen, fhahn Differential Revision: https://reviews.llvm.org/D152253	2023-06-07 22:08:29 +08:00
Serguei Katkov	d57ed844fe	[CGP] Add test to show the missed case in remove llvm.assume	2023-06-07 17:20:57 +07:00
luxufan	e9ddb584e8	[LoopIdiom] Freeze BitPos if !isGuaranteedNotToBeUndefOrPoison Fixes: https://github.com/llvm/llvm-project/issues/62873 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D151690	2023-06-07 14:50:22 +08:00
Joshua Cao	cb9f1aadda	[ValueTracking] Implied conditions for lshr `V1 >> V2 u<= V1` for any V1, V2 This works for lshr and any div's that are changed to lshr's This fixes issues in clang and rustc: https://github.com/llvm/llvm-project/issues/62441 https://github.com/rust-lang/rust/issues/110971 Reviewed By: goldstein.w.n Differential Revision: https://reviews.llvm.org/D151541	2023-06-06 21:06:22 -07:00
Joshua Cao	f23b4faaff	[InstSimplify] Add tests for shl implied conditions	2023-06-06 21:06:22 -07:00
Chuanqi Xu	84c033d9ba	[LICM] [Coroutines] Don't hoist threadlocals within presplit coroutines Close https://github.com/llvm/llvm-project/issues/63022 This is the following of https://reviews.llvm.org/D135550, which is discussed in https://discourse.llvm.org/t/rfc-unify-memory-effect-attributes/65579. In my imagination, we could fix the issue fundamentally after we introduces new memory kind thread id. But I am not very sure if we can fix the issue fundamentally in time. Besides that, I think the correctness is the most important. So it should not be bad to land this given it is innocent. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D151774	2023-06-07 10:25:47 +08:00
Chuanqi Xu	eab8c1eb62	[Coroutines] [LICM] Precommit test for D151774 This is required in the review. Differential Revision: https://reviews.llvm.org/D151774	2023-06-07 10:11:22 +08:00
Paul Kirth	9ad3ca4e9a	Revert "[TypePromotion] Don't treat bitcast as a Source" This reverts commit 27aea17fe061f9778bb1e8ff5fdf9fc0fb03abe1. For details, see: https://reviews.llvm.org/D152112 Fuchsia CI failure: https://ci.chromium.org/ui/p/fuchsia/builders/toolchain.ci/clang-linux-arm64/b8779118297575483793/overview	2023-06-07 00:51:20 +00:00
Matt Arsenault	95a3ae58b8	ValueTracking: Add baseline test for ldexp computeKnownFPClass	2023-06-06 17:09:25 -04:00
Matt Arsenault	eece6ba283	IR: Add llvm.ldexp and llvm.experimental.constrained.ldexp intrinsics AMDGPU has native instructions and target intrinsics for this, but these really should be subject to legalization and generic optimizations. This will enable legalization of f16->f32 on targets without f16 support. Implement a somewhat horrible inline expansion for targets without libcall support. This could be better if we could introduce control flow (GlobalISel version not yet implemented). Support for strictfp legalization is less complete but works for the simple cases.	2023-06-06 17:07:18 -04:00
Guozhi Wei	84bcfa0e1b	[GVN] Improve PRE on load instructions This patch implements the enhancement proposed by https://github.com/llvm/llvm-project/issues/59312. Suppose we have following code v0 = load %addr br %LoadBB LoadBB: v1 = load %addr ... PredBB: ... br %cond, label %LoadBB, label %SuccBB SuccBB: v2 = load %addr ... Instruction v1 in LoadBB is partially redundant, edge (PredBB, LoadBB) is a critical edge. SuccBB is another successor of PredBB, it contains another load v2 which is identical to v1. Current GVN splits the critical edge (PredBB, LoadBB) and inserts a new load in it. A better method is move the load of v2 into PredBB, then v1 can be changed to a PHI instruction. If there are two or more similar predecessors, like the test case in the bug entry, current GVN simply gives up because otherwise it needs to split multiple critical edges. But we can move all loads in successor blocks into predecessors. Differential Revision: https://reviews.llvm.org/D141712	2023-06-06 19:45:34 +00:00
Jolanta Jensen	a963dbb5ac	[SVE ACLE] Extend existing aarch64_sve_mul combines to also act on aarch64_sve_mul_u. Differential Revision: https://reviews.llvm.org/D152004	2023-06-06 15:26:33 +00:00
Mikhail Goncharov	df3a8f3760	Revert "Reland [MergeICmps] Adapt to non-eq comparisons, bugfix" Causes miscompile. See https://reviews.llvm.org/D141188. This reverts commit fb2c98a929aa65603e9d984307a41325e577e9d3	2023-06-06 16:26:52 +02:00
Florian Hahn	8f781b96e2	Revert "[VPlan] Mark recurrence recipes as not having side-effects." This reverts commit 02369b75fdd7b5fc5d9b47f1b60587c225918511. At the moment, live-outs used only for the resume values in the scalar loop are not modeled in VPlan yet. This means first-order recurrence recipes could be removed, when a scalar epilogue is required and the only use of a FOR is outside the loop. Keep treating recurrence recipes as having side-effects for now, to avoid them being removed. Fixes #62954.	2023-06-06 11:35:26 +02:00
Florian Hahn	f47084ecfb	[LV] Use force-vector-width for X86 recurrence test. This makes sure that all tests that can be vectorized in the file are vectorized.	2023-06-06 11:27:35 +02:00
Florian Hahn	4c51a45e80	[LV] Add test for #62954 .	2023-06-06 11:20:22 +02:00
khei4	116670d192	[InstCombine] add overflow checking on Add ~X + C --> (C-1) - X Differential Revision: https://reviews.llvm.org/D152088	2023-06-06 12:24:45 +09:00
khei4	0505fcdccd	[InstCombine] precommit test for D152088(NFC) Differential Revision: https://reviews.llvm.org/D152089	2023-06-06 12:24:45 +09:00
Johannes Doerfert	cb17c48fdd	[Attributor] Identify and remove no-op fences The logic and implementation follows the removal of no-op barriers. If the fence is not making updates visible, either to the world or the current thread, it is not needed. Said differently, the fences we remove do not establish synchronization (happens-before) edges. This allows us to eliminate some of the regression caused by: https://reviews.llvm.org/D145290	2023-06-05 17:14:00 -07:00
Johannes Doerfert	532356e82d	[Attributor] Merge ranges by expansion, avoid unknown ranges Different offsets can be handled by expansion rather than defaulting to an unknown offset. Thus, [4,4] & [8,8] will result in [4, 12] rather than [unknown, unknown].	2023-06-05 16:53:46 -07:00
Johannes Doerfert	87d13b8776	[Attributor][NFC] Precommit vector write range tests	2023-06-05 16:53:45 -07:00
Johannes Doerfert	8f4fadd1b4	[OpenMP] Use "kernel" attribute consistently	2023-06-05 16:33:53 -07:00
Johannes Doerfert	dbbe9b3776	[Attributor] Create `AAMustProgress` for the `mustprogress` attribute Derive the mustprogress attribute based on the willreturn attribute or the fact that all callers are mustprogress. Differential Revision: https://reviews.llvm.org/D94740	2023-06-05 16:33:52 -07:00
Noah Goldstein	73ce343125	[InstCombine] Add transform `(icmp pred (shl {nsw and/or nuw} X, Y), C)` -> `(icmp pred X, C)` Three new transforms: 1) `(icmp pred (shl nsw nuw X, Y), C)` [if `C <= 0`] -> `(icmp pred X, C)` - ugt: https://alive2.llvm.org/ce/z/K_57J_ - sgt: https://alive2.llvm.org/ce/z/BL8u_a - sge: https://alive2.llvm.org/ce/z/yZZVYz - uge: https://alive2.llvm.org/ce/z/R4jwwJ - ule: https://alive2.llvm.org/ce/z/-gbmth - sle: https://alive2.llvm.org/ce/z/ycZVsh - slt: https://alive2.llvm.org/ce/z/4MzHYm - sle: https://alive2.llvm.org/ce/z/fgNfex - ult: https://alive2.llvm.org/ce/z/cXfvH5 - eq : https://alive2.llvm.org/ce/z/sZh_Ti - ne : https://alive2.llvm.org/ce/z/UrqSWA 2) `(icmp eq/ne (shl {nsw\|nuw} X, Y), 0)` -> `(icmp eq/ne X, 0)` - eq+nsw: https://alive2.llvm.org/ce/z/aSJN6D - eq+nuw: https://alive2.llvm.org/ce/z/r2_-br - ne+nuw: https://alive2.llvm.org/ce/z/RkETtu - ne+nsw: https://alive2.llvm.org/ce/z/8iSfW3 3) `(icmp slt (shl nsw X, Y), 0/1)` -> `(icmp pred X, 0/1)` `(icmp sgt (shl nsw X, Y), 0/-1)` -> `(icmp pred X, 0/-1)` - slt: https://alive2.llvm.org/ce/z/eZYRan - sgt: https://alive2.llvm.org/ce/z/QQeP26 Transform 3) is really sle/slt/sge/sgt with 0, but sle/sge canonicalize to slt/sgt respectively so its implemented as such. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D145341	2023-06-05 13:01:03 -05:00
Noah Goldstein	e8e8528085	[InstCombine] Add tests for tranforming `(icmp pred (shl {nsw and/or nuw} X, Y), C)`; NFC Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D145340	2023-06-05 13:01:03 -05:00
Krzysztof Drewniak	23098bd454	[AMDGPU] Add intrinsic for converting global pointers to resources Define the function @llvm.amdgcn.make.buffer.rsrc, which take a 64-bit pointer, the 16-bit stride/swizzling constant that replace the high 16 bits of an address in a buffer resource, the 32-bit extent/number of elements, and the 32-bit flags (the latter two being the 3rd and 4th wards of the resource), and combines them into a ptr addrspace(8). This intrinsic is lowered during the early phases of the backend. This intrinsic is needed so that alias analysis can correctly infer that a certain buffer resource points to the same memory as some global pointer. Previous methods of constructing buffer resources, which relied on ptrtoint, would not allow for such an inference. Depends on D148184 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D148957	2023-06-05 17:07:59 +00:00
Krzysztof Drewniak	faa2c678aa	[AMDGPU] Add buffer intrinsics that take resources as pointers In order to enable the LLVM frontend to better analyze buffer operations (and to potentially enable more precise analyses on the backend), define versions of the raw and structured buffer intrinsics that use `ptr addrspace(8)` instead of `<4 x i32>` to represent their rsrc arguments. The new intrinsics are named by replacing `buffer.` with `buffer.ptr`. One advantage to these intrinsic definitions is that, instead of specifying that a buffer load/store will read/write some memory, we can indicate that the memory read or written will be based on the pointer argument. This means that, for example, a read from a `noalias` buffer can be pulled out of a loop that is modifying a distinct buffer. In the future, we will define custom PseudoSourceValues that will allow us to package up the (buffer, index, offset) triples that buffer intrinsics contain and allow for more precise backend analysis. This work also enables creating address space 7, which represents manipulation of raw buffers using native LLVM load and store instructions. Where tests simply used a buffer intrinsic while testing some other code path (such as the tests for VGPR spills), they have been updated to use the new intrinsic form. Tests that are "about" buffer intrinsics (for instance, those that ensure that they codegen as expected) have been duplicated, either within existing files or into new ones. Depends on D145441 Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D147547	2023-06-05 16:59:07 +00:00
Nikita Popov	79115aebb7	[LoopUnroll] Add test for SCEV invalidation issue (NFC) Test for the issue reported at https://reviews.llvm.org/D149331#4387931.	2023-06-05 17:28:32 +02:00
Antonio Frighetto	420cf63ead	[ConstraintElimination] Refactor `checkAndReplaceCondition` (NFC) Handling `true` and `false` constant replacements is now abstracted out into a single lambda function `ReplaceCmpWithConstant`, so as to reduce code duplication.	2023-06-05 16:54:58 +02:00
David Green	27aea17fe0	[TypePromotion] Don't treat bitcast as a Source This removes BitCasts from isSource in Type Promotion, as I don't believe they need to be treated as Sources. They will usually be from floats or hoisted constants, where constants will be handled already. This fixes #62513, but didn't otherwise cause any differences in the tests I ran. Differential Revision: https://reviews.llvm.org/D152112	2023-06-05 14:42:08 +01:00
khei4	4db8d4f839	[InstCombine] add overflow checking on AddSub `C-(X+C2) --> (C-C2)-X` Differential Revision: https://reviews.llvm.org/D152068	2023-06-05 20:05:06 +09:00
khei4	41588b5880	[InstCombine] precommit test for D152068(NFC) Differential Revision: https://reviews.llvm.org/D152091	2023-06-05 20:05:06 +09:00
Mateja Marjanovic	88421ea973	[AMDGPU] Trim zero components from buffer and image stores For image and buffer stores the default behaviour on GFX11 and older is to set all unset components to zero. So if we pass only X component it will be the same as X000, or XY same as XY00. This patch simplifies the passed vector of components in InstCombine by removing zero components from the end. For image stores it also trims DMask if necessary. Reviewed by: arsenm, foad, nhaehnle, piotr	2023-06-05 12:30:21 +02:00
Mikhail Gudim	d37d4072f2	Reapply [SCCP] Constant propagation through freeze instruction Reapply with extra check for struct types, which caused buildbot failures last time. ----- The freeze instruction has not been handled by SCCPInstVisitor. This patch adds SCCPInstVisitor::visitFreezeInst(FreezeInst &I) method to handle freeze instructions. Differential Revision: https://reviews.llvm.org/D151659	2023-06-05 11:47:36 +02:00
zhongyunde	34d380e1f6	[IndVars] Add check of loop invariant for indirect use We usually only check direct use instruction of IV, while the bitcast of 'ptrtoint ptr to i64' doesn't affect the result, so go a step further. Fix https://github.com/llvm/llvm-project/issues/59633. Reviewed By: markoshorro Differential Revision: https://reviews.llvm.org/D151877	2023-06-03 22:29:09 +08:00
luxufan	1ac99bc452	[InstSimplify] Simplify select i1 ConstExpr, i1 true, i1 false to ConstExpr `select i1 non-const, i1 true, i1 false` has been optimized to `non-const`. There is no reason that we can not optimize `select i1 ConstExpr, i1 true, i1 false` to `ConstExpr`. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D151631	2023-06-03 15:09:00 +08:00
Kazu Hirata	d6f994acb3	[InlineCost] Check for conflicting target attributes early When we inline a callee into a caller, the compiler needs to make sure that the caller supports a superset of instruction sets that the callee is allowed to use. Normally, we check for the compatibility of target features via functionsHaveCompatibleAttributes, but that happens after we decide to honor call site attribute Attribute::AlwaysInline. If the caller contains a call marked with Attribute::AlwaysInline, which can happen with __attribute__((flatten)) placed on the caller, the caller could end up with code that cannot be lowered to assembly code. This patch fixes the problem by checking the target feature compatibility before we honor Attribute::AlwaysInline. Fixes https://github.com/llvm/llvm-project/issues/62664 Differential Revision: https://reviews.llvm.org/D150396	2023-06-02 16:00:47 -07:00
Matt Arsenault	1536e299e6	InstSimplify: Require instruction be parented Unlike every other analysis and transform, simplifyInstruction permitted operating on instructions which are not inserted into a function. This created an edge case no other code needs to really worry about, and limited transforms in cases that can make use of the context function. Only the inliner and a handful of other utilities were making use of this, so just fix up these edge cases. Results in some IR ordering differences since cloned blocks are inserted eagerly now. Plus some additional simplifications trigger (e.g. some add 0s now folded out that previously didn't).	2023-06-02 18:14:28 -04:00
Sami Tolvanen	2831a271c8	[KCFI] Emit debugtrap to make indirect call checks recoverable KCFI traps should always be recoverable, but as Intrinsic::trap is marked noreturn, it's not possible to continue execution after handling the trap as the compiler is free to assume we never return. Switch to debugtrap instead to ensure we have the option to resume execution after the trap.	2023-06-02 19:39:13 +00:00
Matt Arsenault	2fef38f82d	SimpleLoopUnswitch: Add missing test coverage for divergent target check No tests failed when I removed the hasBranchDivergence check, so add one.	2023-06-02 08:30:06 -04:00
Nikita Popov	fa45fb7f0c	[InstCombine] Handle assumes in multi-use demanded bits simplification This fixes the largest remaining discrepancy between results of computeKnownBits() and SimplifyDemandedBits(). We only care about the multi-use case here, because the assume necessarily introduces an extra use.	2023-06-02 14:24:24 +02:00
Jolanta Jensen	dc63b35b02	[SVE ACLE] Extend IR combines for fmul, fsub, fadd to cover _u variants This patch extends existing IR combines for: fmul, fsub and fadd, relying on all active predicate to also apply to their equivalent undef (_u) intrinsics. Differential Revision: https://reviews.llvm.org/D150768	2023-06-02 11:06:57 +00:00
Matt Arsenault	8609df7c6e	AMDGPU: Refine undef handling for llvm.amdgcn.class intrinsic This barely matters since 99% are converted to the generic intrinsic now, and the only real difference is the target intrinsic supports a variable test mask. Start propagating poison. Prefer folding to a defined result (false) for an undef test mask. Propagate undef for the first operand.	2023-06-01 18:35:55 -04:00
Alexandros Lamprineas	b1f41685a6	[IPSCCP] Decouple queries for function analysis results. The SCCPSolver is using a structure (AnalysisResultsForFn) where it keeps pointers to various analyses needed by the IPSCCP pass. These analyses are requested all at the same time, which can become problematic in some cases. For example one could be retrieved via getCachedAnalysis() prior to the actual execution of the analysis. In more detail: The IPSCCP pass uses a DomTreeUpdater to preserve the PostDominatorTree in case the PostDominatorTreeAnalysis had run before IPSCCP. Starting with commit 1b1232047e83b the IPSCCP pass may use BlockFrequencyAnalysis for some functions in the module. As a result, the PostDominatorTreeAnalysis may not run until the BlockFrequencyAnalysis has run, since the latter analysis depends on the former. Currently, we setup the DomTreeUpdater using getCachedAnalysis to retrieve a PostDominatorTree. This happens before BlockFrequencyAnalysis has run, therefore the cached analysis can become invalid by the time we use it. Differential Revision: https://reviews.llvm.org/D151666	2023-06-01 16:38:04 +01:00
Florian Hahn	3b912e269a	[LV] Bail out on loop-variant steps when rewriting SCEV exprs. If the step is not loop-invariant, we cannot create a modified AddRec, as the start needs to be loop-invariant. Mark those cases as CannotAnalyze and bail out, to fix a crash.	2023-06-01 16:14:02 +01:00
Nikita Popov	0213c6d0df	[InstCombine] Use DL-aware constant folding for phi compare Serves the dual purpose of avoiding an extra InstCombine iteration for the DL-aware folding and removing one icmp constexpr use.	2023-06-01 16:02:36 +02:00
Paulo Matos	9485d983ac	[InstCombine] Disable generation of fshl/fshr for rotates Disable conversion of funnel shifts (fshl/fshr) into rotates unless one of the operands is known to be a constant value. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D150670	2023-06-01 15:31:49 +02:00
Nikita Popov	223f9b096e	Revert "[SCCP] Constant propagation through freeze instruction" This reverts commit 559d47a1790e1a9f9b1f8838a443eb7624ef1ac7. Caused failure on sanitizer-aarch64-linux-bootstrap-ubsan: clang++: /b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/llvm/lib/Transforms/Utils/SCCPSolver.cpp:442: llvm::ValueLatticeElement &llvm::SCCPInstVisitor::getValueState(llvm::Value *): Assertion `!V->getType()->isStructTy() && "Should use getStructValueState"' failed.	2023-06-01 15:30:46 +02:00
Mikhail Gudim	559d47a179	[SCCP] Constant propagation through freeze instruction The freeze instruction has not been handled by SCCPInstVisitor. This patch adds SCCPInstVisitor::visitFreezeInst(FreezeInst &I) method to handle freeze instructions. Differential Revision: https://reviews.llvm.org/D151659	2023-06-01 15:06:59 +02:00
Igor Kirillov	50dfc9e35d	[LoopLoadElimination] Add support for stride equal to -1 This patch allows us to gain all the benefits provided by LoopLoadElimination pass to descending loops. Differential Revision: https://reviews.llvm.org/D151448	2023-06-01 12:10:53 +00:00

1 2 3 4 5 ...

25910 Commits