llvm-project

Author	SHA1	Message	Date
Stanislav Mekhanoshin	01761a73e4	[AMDGPU] Add missing intrinsic declaration to intrinsics.ll. NFC. (#138954 )	2025-05-08 00:42:35 -07:00
Matt Arsenault	9383fb23e1	Reapply "IR: Remove uselist for constantdata (#137313 )" (#138961 ) Reapply "IR: Remove uselist for constantdata (#137313)" This reverts commit 5936c02c8b9c6d1476f7830517781ce8b6e26e75. Fix checking uselists of constants in assume bundle queries	2025-05-08 08:00:09 +02:00
Chengjun	934cfa796e	[NVPTX] Fix NVPTXAA_before_BasicAA Test (#138992 ) Fix the failed test in the [PR](https://github.com/llvm/llvm-project/pull/125965) by moving the test to CodeGen/NVPTX.	2025-05-07 16:59:33 -07:00
Chengjun	94d933676c	[AA] Move Target Specific AA before BasicAA (#125965 ) In this change, NVPTX AA is moved before Basic AA to potentially improve compile time. Additionally, it introduces a flag in the `ExternalAAWrapper` that allows other backends to run their target-specific AA passes before Basic AA, if desired. The change works for both New Pass Manager and Legacy Pass Manager. Original implementation by Princeton Ferro <pferro@nvidia.com>	2025-05-07 15:25:48 -07:00
Florian Hahn	9da103ab9e	[LAA] Update remaining tests after 384a5b00a7.	2025-05-07 21:02:44 +01:00
vaibhav	384a5b00a7	[LAA] Use MaxStride instead of CommonStride to calculate MaxVF (#98142 ) We bail out from MaxVF calculation if the strides are not the same. Instead, we are dependent on runtime checks, though not yet implemented. We could instead use the MaxStride to conservatively use an upper bound. This handles cases like the following: ```c #define LEN 256 * 256 float a[LEN]; void gather() { for (int i = 0; i < LEN - 1024 - 255; i++) { #pragma clang loop interleave(disable) #pragma clang loop unroll(disable) for (int j = 0; j < 256; j++) a[i + j + 1024] += a[j * 4 + i]; } } ``` --------- Co-authored-by: Florian Hahn <flo@fhahn.com>	2025-05-07 21:02:21 +01:00
Harald van Dijk	32752913b1	[ARM] Do not assume memory intrinsics specify alignment. (#138356 )	2025-05-07 16:25:03 +01:00
Kirill Stoimenov	5936c02c8b	Revert "IR: Remove uselist for constantdata (#137313 )" Possibly breaks the build: https://lab.llvm.org/buildbot/#/builders/24/builds/8119 This reverts commit 87f312aad6ede636cd2de5d18f3058bf2caf5651.	2025-05-07 00:07:55 +00:00
Matt Arsenault	87f312aad6	IR: Remove uselist for constantdata (#137313 ) This is a resurrected version of the patch attached to this RFC: https://discourse.llvm.org/t/rfc-constantdata-should-not-have-use-lists/42606 In this adaptation, there are a few differences. In the original patch, the Use's use list was replaced with an unsigned* to the reference count in the value. This version leaves them as null and leaves the ref counting only in Value. Remove use-lists from instances of ConstantData (which are shared across modules and have no operands). To continue supporting most of the use-list API, store a ref-count in place of the use-list; this is for API like Value::use_empty and Value::hasNUses. Operations that actually need the use-list -- like Value::use_begin -- will assert. This change has three benefits: 1. The compiler output cannot in any way depend on the use-list order of instances of ConstantData. 2. There's no use-list traffic when adding and removing simple constants from operand lists (although there is ref-count traffic; YMMV). 3. It's cheaper to serialize use-lists (since we're no longer serializing the use-list order of things like i32 0). The downside is that you can't look at all the users of ConstantData, but traversals of users of i32 0 are already ill-advised. Possible follow-ups: - Track if an instance of a ConstantVector/ConstantArray/etc. is known to have all ConstantData arguments, and drop the use-lists to ref-counts in those cases. Callers need to check Value::hasUseList before iterating through the use-list. - Remove even the ref-counts. I'm not sure they have any benefit besides minimizing the scope of this commit, and maintaining the counts is not free. Fixes #58629 Co-authored-by: Duncan P. N. Exon Smith <dexonsmith@apple.com>	2025-05-06 17:20:37 +02:00
Nikita Popov	027b203814	[BasicAA] Gracefully handle large LocationSize (#138528 ) If the LocationSize is larger than the index space of the pointer type, bail out instead of triggering an APInt assertion. Fixes the issue reported at https://github.com/llvm/llvm-project/pull/119365#issuecomment-2849874894.	2025-05-06 14:19:47 +02:00
Craig Topper	52d2b589b2	[IndVarSimplify] Set samesign when converting signed comparison to unsigned comparison in eliminateIVComparison. (#138215 )	2025-05-02 08:17:45 -07:00
Paul Walker	0fab741a26	[LLVM][CodeGen][AArch64] Don't scalarise v8{f16,bf16} vsetcc operations. (#135398 ) I have also removed custom promotion code for the v4{f16,bf16} cases because the same common code can be used.	2025-05-01 13:06:39 +01:00
Alexander Richardson	ee13638362	[AMDGPU] Remove explicit datalayout from tests where not needed Since e39f6c1844fab59c638d8059a6cf139adb42279a opt will infer the correct datalayout when given a triple. Avoid explicitly specifying it in tests that depend on the AMDGPU target being present to avoid the string becoming out of sync with the TargetInfo value. Only tests with REQUIRES: amdgpu-registered-target or a local lit.cfg were updated to ensure that tests for non-target-specific passes that happen to use the AMDGPU layout still pass when building with a limited set of targets. Reviewed By: shiltian, arsenm Pull Request: https://github.com/llvm/llvm-project/pull/137921	2025-04-30 10:58:17 -07:00
Vikram Hegde	f91a6e6dab	[SCEV] Reject comparision of pointers to different address spaces in SCEVWrapPredicate::implies (#137935 )	2025-04-30 21:03:31 +05:30
paperchalice	159628cc22	[CodeGen] Port MachineUniformityAnalysis to new pass manager (#137578 ) - Add new pass manager version of `MachineUniformityAnalysis `. - Query `TargetTransformInfo` in new pass manager version. - Use `printAsOperand` when printing machine function name	2025-04-30 10:44:06 +08:00
Craig Topper	fff622fbf7	[BasicAA] Account for wrapping when using abs(ScaleV0 + (-Scale)V1) >= abs(Scale) (#137755 ) Similar to 1b7ef6aac8a3cad245c0ed14fe21725e31261f73, add a check to only set MinAbsVarIndex if abs(ScaleV0) and abs((-Scale)V1) won't wrap. In the absence of IsNSW, try to use the bitwidths of the original V and Scale to rule out wrapping	2025-04-29 14:10:37 -07:00
Ramkumar Ramachandra	13b443f2ef	[LAA] Improve convergent tests (#136758 ) LoopAccessAnalysis has code for handling function calls where the function is marked with the 'convergent' attribute, but the test coverage is insufficient. Fix this by adding a test showing the case of no-runtime-checks adapted from LoopDistribute, and clean up the existing test with runtime-checks. Also regenerate the test file with UpdateTestChecks.	2025-04-29 09:50:33 +01:00
Alexey Bataev	ea1b525ceb	[LAA] Add tests with non-power-of-2 store-load forward distance (#136710 )	2025-04-28 15:10:55 -07:00
Alexey Bataev	88f8637d22	Revert "[LAA] Add tests with non-power-of-2 store-load forward distance (#136710 )" This reverts commit 51bbebb6677bb0ea14ca62cc140492965c2a6e19 to fix buildbots https://lab.llvm.org/buildbot/#/builders/137/builds/17662	2025-04-28 14:36:44 -07:00
Alexey Bataev	51bbebb667	[LAA] Add tests with non-power-of-2 store-load forward distance (#136710 )	2025-04-28 17:02:02 -04:00
David Green	86b7ce9497	[CostModel] Remove some negative costs. (#135533 ) The cost model in the past returned -1 for unknown costs, but over time this has largely been removed. This cleans up some of the uses that have remained. It uses 0/free for the cost of an insert and 1/basic for the cost of anything that is unknown.	2025-04-28 21:08:14 +01:00
Ashley Coleman	f12fb2ff74	[HLSL] Analyze updateCounter usage (#135669 ) Fixes https://github.com/llvm/llvm-project/issues/135667 Analyze and annotate `ResourceInfo` with the derived direction of calls to updateCounter (if any). This change only sets the value. Any diagnostics that should be raised must be done somewhere else.	2025-04-24 13:17:24 -06:00
Philip Reames	1c722fc8f5	[RISCV][TTI] Use processShuffleMask for shuffle legalization estimate (#136191 ) We had some code which tried to estimate legalization costs for illegally typed shuffles, but it only handled the case of a widening shuffle, and used a somewhat adhoc heuristic. We can reuse the processShuffleMask utility (which we already use for individual vector register splitting when exact VLEN is known) to perform the same splitting given the legal vector type as the unit of split instead. This makes the costing both simpler and more robust. Note that this swings costs for illegal shuffles pretty wildly as we were previously sometimes hitting the adhoc code, and sometimes falling through into generic scalarization costing. I don't know that any of the costs for the individual tests in tree are significant, but the test which which triggered me finding this was reported to me by Alexey reduced from something triggering a bad choice in SLP for x264. So this has the potential to be somewhat high impact.	2025-04-22 10:50:20 -07:00
Paul Walker	a095ebc58c	[LLVM][CostModel][AArch64] Remove magic numbers from f16 vector compares. (#135795 ) The PR also extends the code to cover bfloat vector compares that are also promoted to float. NOTE: There is a bail out for the compares that are scalarised that will be removed by https://github.com/llvm/llvm-project/pull/135398.	2025-04-22 11:20:17 +01:00
Mircea Trofin	93b74f7178	[ctxprof] Scale up everything under a root by its `TotalRootEntryCount` (#136015 ) `TotalRootEntryCount` captures how many times that root was entered - regardless if a profile was also collected or not (profile collection for a given root happens on only one thread at a time). We don't do this in compiler_rt because the goal there is to flush out the data as fast as possible, so traversing and multiplying vectors is punted to the profile user. We really just need to do this when flattening the profile so that the values across roots and flat profiles match. We could do it earlier, too - like when loading the profile - but it seems beneficial (at least for debugging) to keep the counter values the same as the loaded ones. We can revisit this later.	2025-04-21 08:43:21 -07:00
Luke Lau	053451cb35	[RISCV] Handle scalarized reductions in getArithmeticReductionCost This fixes a crash reported at https://github.com/llvm/llvm-project/pull/114250#issuecomment-2813686061 If the vector type isn't legal at all, e.g. bfloat with +zvfbfmin, then the legalized type will be scalarized. So use getScalarType() instead of getVectorElement() when checking for f16/bf16.	2025-04-21 16:07:15 +08:00
David Green	2ba455ff3d	[AArch64] Add CostKind to getSpliceCost (#135537 ) This likely does not alter much yet with how the costs are used. Like other cost functions the CostKind should be passed into and through the function.	2025-04-21 06:31:03 +01:00
Philip Reames	295e56c0a6	[RISCV] Add a couple of cost model tests for shuffles requiring legalization	2025-04-17 12:44:44 -07:00
Paul Walker	637f352c3e	[NFC][CostModel][AArch64] Add bf16 coverage for compare tests.	2025-04-15 15:04:59 +00:00
Ashley Coleman	b07fc0fd5e	[HLSL] Move Resource Instance Properties from TypeInfo (#135259 ) Fixes https://github.com/llvm/llvm-project/issues/134741 Moves Resource Instance properties from type info into resource info as described in https://github.com/llvm/wg-hlsl/blob/main/proposals/0022-resource-instance-analysis.md	2025-04-14 13:41:36 -06:00
Yingwei Zheng	bb9580a02b	[SCEV] Use ashr to adjust constant multipliers (#135534 ) SCEV converts "-2 nsw (i32 V)" into "2148473647 nsw (i32 V)". But we cannot preserve the nsw flag when the constant multiplier is negative. This patch changes lshr to ashr so that we can preserve both nsw and nuw flags. Alive2 proof: https://alive2.llvm.org/ce/z/LZVSEa Closes https://github.com/llvm/llvm-project/issues/135531.	2025-04-13 20:22:48 +08:00
David Green	052225dc03	[AArch64] Use a lower Costsize cost in getScalarizationOverhead. This is a follow on to #130946 to use the same codesize cost override in getScalarizationOverhead for vector instructions.	2025-04-11 20:18:26 +01:00
Yingwei Zheng	db27a0af5e	[AMDGPU][InstCombine][InstSimplify] Pre-commit tests for PR130742 (#135305 ) https://github.com/llvm/llvm-project/pull/130742#discussion_r1993055149	2025-04-11 12:42:14 +08:00
Ulrich Weigand	80267f8148	Support z17 processor name and scheduler description (#135254 ) The recently announced IBM z17 processor implements the architecture already supported as "arch15" in LLVM. This patch adds support for "z17" as an alternate architecture name for arch15. This patch also add the scheduler description for the z17 processor, provided by Jonas Paulsson.	2025-04-11 00:20:58 +02:00
Mircea Trofin	442050ce8f	[ctxprof] Flatten indirect call info in pre-thinlink compilation (#134766 ) Same idea as in #134723 - flatten indirect call info in `"VP"` `MD_prof` metadata for the thinlinker, for cases that aren't covered by a contextual profile. If we don't ICP an indirect call target in the specialized module, the call will fall to the copy of that target outside the specialized module. If the graph under that target also has some indirect calls, in the absence of this pass, we'd have a steeper performance regression - because none of those would have a chance to be ICPed.	2025-04-08 17:33:37 -07:00
Mircea Trofin	4c90d977db	[ctxprof] Use the flattened contextual profile pre-thinlink (#134723 ) Flatten the profile pre-thinlink so that ThinLTO has something to work with for the parts of the binary that aren't covered by contextual profiles. Post-thinlink, the flattener is re-run and will actually change profile info, but just for the modules containing contextual trees ("specialized modules"). For the rest, the flattener just yanks out the instrumentation.	2025-04-08 17:30:49 -07:00
David Green	c23e1cb936	[BasicAA] Treat ExtractValue(Argument) similar to Argument in relation to function-local objects. (#134716 ) This is a much smaller, technically orthogonal patch similar to #134505. It states that a extractvalue(Argument) can be treated like an Argument for alias analysis, where the extractelement acts like a phi / copy. No inttoptr here.	2025-04-08 10:05:58 +01:00
David Green	9d82ab8a82	[BasicAA] Add some test cases for coerced function args	2025-04-08 10:02:24 +01:00
Mircea Trofin	f1bb2fe356	[ctxprof] Use `isInSpecializedModule` as criteria for using contextual profile (#134468 ) After #134340, the availability of contextual profile isn't in itself an indication of compiling the module containing all the functions covered by that profile.	2025-04-07 19:55:00 -07:00
Mircea Trofin	6a3e5f89bb	[ctxprof] Only prune the profile in modules containing only context trees (#134340 ) We will subsequently treat the whole profile as "flat" in the frontend, (i.e flatten and combine with the flat profile section), so we can have a profile for ThinLTO for parts of the application that don't come under the contextual profile. After ThinLTO, we will treat the module(s) containing contextual trees differently: they'll have only the contextual profile pertinent to them. The rest of the modules (non-contextual) will proceed "as usual", off the flattened profile. This patch implements pruning of the contextual profile to enable the above.	2025-04-07 19:52:03 -07:00
Ashley Coleman	e3369a8dc9	[NFC][HLSL] Rename ResourceBinding Types (#134165 ) Non-functional change as first step in https://github.com/llvm/wg-hlsl/pull/207 Removes `Binding` from "Resource Instance" types	2025-04-04 16:51:35 -06:00
Luke Lau	79435de8a5	[ConstantFold] Support scalable constant splats in ConstantFoldCastInstruction (#133207 ) Previously only fixed vector splats were handled. This adds supports for scalable vectors too by allowing ConstantExpr splats. We need to add the extra V->getType()->isVectorTy() check because a ConstantExpr might be a scalar to vector bitcast. By allowing ConstantExprs this also allow fixed vector ConstantExprs to be folded, which causes the diffs in llvm/test/Analysis/ValueTracking/known-bits-from-operator-constexpr.ll and llvm/test/Transforms/InstSimplify/ConstProp/cast-vector.ll. I can remove them from this PR if reviewers would prefer. Fixes #132922	2025-04-03 16:24:56 +01:00
David Green	c51b24c36a	[AArch64] Use getVectorInstrCost in div cost The costs of ExtractElement and InsertElement should be obtained via getVectorInstrCost.	2025-04-02 14:51:22 +01:00
Yingwei Zheng	f066d7504e	[Reland][SCEV] teach isImpliedViaOperations about samesign (#133711 ) This patch relands https://github.com/llvm/llvm-project/pull/124270. Closes https://github.com/llvm/llvm-project/issues/126409. The root cause is that we incorrectly preserve the samesign flag after truncating operands of an icmp: https://alive2.llvm.org/ce/z/4NE9gS --------- Co-authored-by: Ramkumar Ramachandra <ramkumar.ramachandra@codasip.com>	2025-04-02 18:45:33 +08:00
Jeremy Morse	1ebc308bba	[DebugInfo][RemoveDIs] Remove debug-intrinsic printing cmdline options (#131855 ) During the transition from debug intrinsics to debug records, we used several different command line options to customise handling: the printing of debug records to bitcode and textual could be independent of how the debug-info was represented inside a module, whether the autoupgrader ran could be customised. This was all valuable during development, but now that totally removing debug intrinsics is coming up, this patch removes those options in favour of a single flag (experimental-debuginfo-iterators), which enables autoupgrade, in-memory debug records, and debug record printing to bitcode and textual IR. We need to do this ahead of removing the experimental-debuginfo-iterators flag, to reduce the amount of test-juggling that happens at that time. There are quite a number of weird test behaviours related to this -- some of which I simply delete in this commit. Things like print-non-instruction-debug-info.ll , the test suite now checks for debug records in all tests, and we don't want to check we can print as intrinsics. Or the update_test_checks tests -- these are duplicated with write-experimental-debuginfo=false to ensure file writing for intrinsics is correct, but that's something we're imminently going to delete. A short survey of curious test changes: * free-intrinsics.ll: we don't need to test that debug-info is a zero cost intrinsic, because we won't be using intrinsics in the future. * undef-dbg-val.ll: apparently we pinned this to non-RemoveDIs in-memory mode while we sorted something out; it works now either way. * salvage-cast-debug-info.ll: was testing intrinsics-in-memory get salvaged, isn't necessary now * localize-constexpr-debuginfo.ll: was producing "dead metadata" intrinsics for optimised-out variable values, dbg-records takes the (correct) representation of poison/undef as an operand. Looks like we didn't update this in the past to avoid spurious test differences. * Transforms/Scalarizer/dbginfo.ll: this test was explicitly testing that debug-info affected codegen, and we deferred updating the tests until now. This is just one of those silent gnochange issues that get fixed by RemoveDIs. Finally: I've added a bitcode test, dbg-intrinsics-autoupgrade.ll.bc, that checks we can autoupgrade debug intrinsics that are in bitcode into the new debug records.	2025-04-01 14:27:11 +01:00
David Green	2653eb52d1	[AArch64] Add -cost-kind=all coverage for sve cost tests. NFC	2025-03-31 19:49:39 +01:00
David Green	668edb43a0	[AArch64] Update more costmodel tests with -cost-kind=all. NFC	2025-03-31 19:09:03 +01:00
David Green	9cdab16da9	[AArch64] Remove CODE llc run lines from costmodel tests. NFC The code is already tested in CodeGen/AArch64 tests such as neon-perm.ll and the select- tests.	2025-03-31 17:44:39 +01:00
Alexey Bataev	78777a204a	[LV]Split store-load forward distance analysis from other checks, NFC (#121156 ) The patch splits the store-load forwarding distance analysis from other dependency analysis in LAA. Currently it supports only power-of-2 distances, required to support non-power-of-2 distances in future. Part of #100755	2025-03-31 07:28:44 -04:00
Matt Arsenault	94122d58fc	Lint: Replace -lint-abort-on-error cl::opt with pass parameter (#132933 )	2025-03-31 08:42:51 +07:00

... 2 3 4 5 6 ...

5072 Commits