5072 Commits

Author SHA1 Message Date
Stanislav Mekhanoshin
01761a73e4
[AMDGPU] Add missing intrinsic declaration to intrinsics.ll. NFC. (#138954) 2025-05-08 00:42:35 -07:00
Matt Arsenault
9383fb23e1
Reapply "IR: Remove uselist for constantdata (#137313)" (#138961)
Reapply "IR: Remove uselist for constantdata (#137313)"

This reverts commit 5936c02c8b9c6d1476f7830517781ce8b6e26e75.

Fix checking uselists of constants in assume bundle queries
2025-05-08 08:00:09 +02:00
Chengjun
934cfa796e
[NVPTX] Fix NVPTXAA_before_BasicAA Test (#138992)
Fix the failed test in the
[PR](https://github.com/llvm/llvm-project/pull/125965) by moving the
test to CodeGen/NVPTX.
2025-05-07 16:59:33 -07:00
Chengjun
94d933676c
[AA] Move Target Specific AA before BasicAA (#125965)
In this change, NVPTX AA is moved before Basic AA to potentially improve
compile time. Additionally, it introduces a flag in the
`ExternalAAWrapper` that allows other backends to run their
target-specific AA passes before Basic AA, if desired.

The change works for both New Pass Manager and Legacy Pass Manager.

Original implementation by Princeton Ferro <pferro@nvidia.com>
2025-05-07 15:25:48 -07:00
Florian Hahn
9da103ab9e
[LAA] Update remaining tests after 384a5b00a7. 2025-05-07 21:02:44 +01:00
vaibhav
384a5b00a7
[LAA] Use MaxStride instead of CommonStride to calculate MaxVF (#98142)
We bail out from MaxVF calculation if the strides are not the same.
Instead, we are dependent on runtime checks, though not yet implemented.
We could instead use the MaxStride to conservatively use an upper bound.

This handles cases like the following:
```c
#define LEN 256 * 256
float a[LEN];

void gather() {
  for (int i = 0; i < LEN - 1024 - 255; i++) {
  #pragma clang loop interleave(disable)
  #pragma clang loop unroll(disable)
    for (int j = 0; j < 256; j++)
      a[i + j + 1024] += a[j * 4 + i];
  }
}
```

---------

Co-authored-by: Florian Hahn <flo@fhahn.com>
2025-05-07 21:02:21 +01:00
Harald van Dijk
32752913b1
[ARM] Do not assume memory intrinsics specify alignment. (#138356) 2025-05-07 16:25:03 +01:00
Kirill Stoimenov
5936c02c8b Revert "IR: Remove uselist for constantdata (#137313)"
Possibly breaks the build: https://lab.llvm.org/buildbot/#/builders/24/builds/8119

This reverts commit 87f312aad6ede636cd2de5d18f3058bf2caf5651.
2025-05-07 00:07:55 +00:00
Matt Arsenault
87f312aad6
IR: Remove uselist for constantdata (#137313)
This is a resurrected version of the patch attached to this RFC:

https://discourse.llvm.org/t/rfc-constantdata-should-not-have-use-lists/42606

In this adaptation, there are a few differences. In the original patch, the Use's
use list was replaced with an unsigned* to the reference count in the value. This
version leaves them as null and leaves the ref counting only in Value.

Remove use-lists from instances of ConstantData (which are shared
across modules and have no operands).

To continue supporting most of the use-list API, store a ref-count in
place of the use-list; this is for API like Value::use_empty and
Value::hasNUses.  Operations that actually need the use-list -- like
Value::use_begin -- will assert.

This change has three benefits:

 1. The compiler output cannot in any way depend on the use-list order
    of instances of ConstantData.

 2. There's no use-list traffic when adding and removing simple
    constants from operand lists (although there is ref-count traffic;
    YMMV).

 3. It's cheaper to serialize use-lists (since we're no longer
    serializing the use-list order of things like i32 0).

The downside is that you can't look at all the users of ConstantData,
but traversals of users of i32 0 are already ill-advised.

Possible follow-ups:
  - Track if an instance of a ConstantVector/ConstantArray/etc. is known
    to have all ConstantData arguments, and drop the use-lists to
    ref-counts in those cases.  Callers need to check Value::hasUseList
    before iterating through the use-list.
  - Remove even the ref-counts.  I'm not sure they have any benefit
    besides minimizing the scope of this commit, and maintaining the
    counts is not free.

Fixes #58629

Co-authored-by: Duncan P. N. Exon Smith <dexonsmith@apple.com>
2025-05-06 17:20:37 +02:00
Nikita Popov
027b203814
[BasicAA] Gracefully handle large LocationSize (#138528)
If the LocationSize is larger than the index space of the pointer type,
bail out instead of triggering an APInt assertion.

Fixes the issue reported at
https://github.com/llvm/llvm-project/pull/119365#issuecomment-2849874894.
2025-05-06 14:19:47 +02:00
Craig Topper
52d2b589b2
[IndVarSimplify] Set samesign when converting signed comparison to unsigned comparison in eliminateIVComparison. (#138215) 2025-05-02 08:17:45 -07:00
Paul Walker
0fab741a26
[LLVM][CodeGen][AArch64] Don't scalarise v8{f16,bf16} vsetcc operations. (#135398)
I have also removed custom promotion code for the v4{f16,bf16} cases
because the same common code can be used.
2025-05-01 13:06:39 +01:00
Alexander Richardson
ee13638362
[AMDGPU] Remove explicit datalayout from tests where not needed
Since e39f6c1844fab59c638d8059a6cf139adb42279a opt will infer the
correct datalayout when given a triple. Avoid explicitly specifying it
in tests that depend on the AMDGPU target being present to avoid the
string becoming out of sync with the TargetInfo value.
Only tests with REQUIRES: amdgpu-registered-target or a local lit.cfg
were updated to ensure that tests for non-target-specific passes that
happen to use the AMDGPU layout still pass when building with a limited
set of targets.

Reviewed By: shiltian, arsenm

Pull Request: https://github.com/llvm/llvm-project/pull/137921
2025-04-30 10:58:17 -07:00
Vikram Hegde
f91a6e6dab
[SCEV] Reject comparision of pointers to different address spaces in SCEVWrapPredicate::implies (#137935) 2025-04-30 21:03:31 +05:30
paperchalice
159628cc22
[CodeGen] Port MachineUniformityAnalysis to new pass manager (#137578)
- Add new pass manager version of `MachineUniformityAnalysis `.
- Query `TargetTransformInfo` in new pass manager version.
- Use `printAsOperand` when printing machine function name
2025-04-30 10:44:06 +08:00
Craig Topper
fff622fbf7
[BasicAA] Account for wrapping when using abs(Scale*V0 + (-Scale)*V1) >= abs(Scale) (#137755)
Similar to 1b7ef6aac8a3cad245c0ed14fe21725e31261f73, add a check to only
set MinAbsVarIndex if abs(Scale*V0) and abs((-Scale)*V1) won't wrap. In
the absence of IsNSW, try to use the bitwidths of the original V and
Scale to rule out wrapping
2025-04-29 14:10:37 -07:00
Ramkumar Ramachandra
13b443f2ef
[LAA] Improve convergent tests (#136758)
LoopAccessAnalysis has code for handling function calls where the
function is marked with the 'convergent' attribute, but the test
coverage is insufficient. Fix this by adding a test showing the case of
no-runtime-checks adapted from LoopDistribute, and clean up the existing
test with runtime-checks. Also regenerate the test file with
UpdateTestChecks.
2025-04-29 09:50:33 +01:00
Alexey Bataev
ea1b525ceb [LAA] Add tests with non-power-of-2 store-load forward distance (#136710) 2025-04-28 15:10:55 -07:00
Alexey Bataev
88f8637d22 Revert "[LAA] Add tests with non-power-of-2 store-load forward distance (#136710)"
This reverts commit 51bbebb6677bb0ea14ca62cc140492965c2a6e19 to fix
buildbots https://lab.llvm.org/buildbot/#/builders/137/builds/17662
2025-04-28 14:36:44 -07:00
Alexey Bataev
51bbebb667
[LAA] Add tests with non-power-of-2 store-load forward distance (#136710) 2025-04-28 17:02:02 -04:00
David Green
86b7ce9497
[CostModel] Remove some negative costs. (#135533)
The cost model in the past returned -1 for unknown costs, but over time
this has largely been removed. This cleans up some of the uses that have
remained. It uses 0/free for the cost of an insert and 1/basic for the
cost of anything that is unknown.
2025-04-28 21:08:14 +01:00
Ashley Coleman
f12fb2ff74
[HLSL] Analyze updateCounter usage (#135669)
Fixes https://github.com/llvm/llvm-project/issues/135667

Analyze and annotate `ResourceInfo` with the derived direction of calls
to updateCounter (if any).

This change only sets the value. Any diagnostics that should be raised
must be done somewhere else.
2025-04-24 13:17:24 -06:00
Philip Reames
1c722fc8f5
[RISCV][TTI] Use processShuffleMask for shuffle legalization estimate (#136191)
We had some code which tried to estimate legalization costs for
illegally typed shuffles, but it only handled the case of a widening
shuffle, and used a somewhat adhoc heuristic. We can reuse the
processShuffleMask utility (which we already use for individual vector
register splitting when exact VLEN is known) to perform the same
splitting given the legal vector type as the unit of split instead. This
makes the costing both simpler and more robust.

Note that this swings costs for illegal shuffles pretty wildly as we
were previously sometimes hitting the adhoc code, and sometimes falling
through into generic scalarization costing. I don't know that any of the
costs for the individual tests in tree are significant, but the test
which which triggered me finding this was reported to me by Alexey
reduced from something triggering a bad choice in SLP for x264. So this
has the potential to be somewhat high impact.
2025-04-22 10:50:20 -07:00
Paul Walker
a095ebc58c
[LLVM][CostModel][AArch64] Remove magic numbers from f16 vector compares. (#135795)
The PR also extends the code to cover bfloat vector compares that are
also promoted to float.

NOTE: There is a bail out for the compares that are scalarised that will
be removed by https://github.com/llvm/llvm-project/pull/135398.
2025-04-22 11:20:17 +01:00
Mircea Trofin
93b74f7178
[ctxprof] Scale up everything under a root by its TotalRootEntryCount (#136015)
`TotalRootEntryCount` captures how many times that root was entered - regardless if a profile was also collected or not (profile collection for a given root happens on only one thread at a time).

We don't do this in compiler_rt because the goal there is to flush out the data as fast as possible, so traversing and multiplying vectors is punted to the profile user.

We really just need to do this when flattening the profile so that the values across roots and flat profiles match. We could do it earlier, too - like when loading the profile - but it seems beneficial (at least for debugging) to keep the counter values the same as the loaded ones. We can revisit this later.
2025-04-21 08:43:21 -07:00
Luke Lau
053451cb35 [RISCV] Handle scalarized reductions in getArithmeticReductionCost
This fixes a crash reported at
https://github.com/llvm/llvm-project/pull/114250#issuecomment-2813686061

If the vector type isn't legal at all, e.g. bfloat with +zvfbfmin,
then the legalized type will be scalarized. So use getScalarType()
instead of getVectorElement() when checking for f16/bf16.
2025-04-21 16:07:15 +08:00
David Green
2ba455ff3d
[AArch64] Add CostKind to getSpliceCost (#135537)
This likely does not alter much yet with how the costs are used. Like
other cost functions the CostKind should be passed into and through the
function.
2025-04-21 06:31:03 +01:00
Philip Reames
295e56c0a6 [RISCV] Add a couple of cost model tests for shuffles requiring legalization 2025-04-17 12:44:44 -07:00
Paul Walker
637f352c3e [NFC][CostModel][AArch64] Add bf16 coverage for compare tests. 2025-04-15 15:04:59 +00:00
Ashley Coleman
b07fc0fd5e
[HLSL] Move Resource Instance Properties from TypeInfo (#135259)
Fixes https://github.com/llvm/llvm-project/issues/134741

Moves Resource Instance properties from type info into resource info as
described in
https://github.com/llvm/wg-hlsl/blob/main/proposals/0022-resource-instance-analysis.md
2025-04-14 13:41:36 -06:00
Yingwei Zheng
bb9580a02b
[SCEV] Use ashr to adjust constant multipliers (#135534)
SCEV converts "-2 *nsw (i32 V)" into "2148473647 *nsw (i32 V)". But we
cannot preserve the nsw flag when the constant multiplier is negative.
This patch changes lshr to ashr so that we can preserve both nsw and nuw
flags.

Alive2 proof: https://alive2.llvm.org/ce/z/LZVSEa
Closes https://github.com/llvm/llvm-project/issues/135531.
2025-04-13 20:22:48 +08:00
David Green
052225dc03 [AArch64] Use a lower Costsize cost in getScalarizationOverhead.
This is a follow on to #130946 to use the same codesize cost override in
getScalarizationOverhead for vector instructions.
2025-04-11 20:18:26 +01:00
Yingwei Zheng
db27a0af5e
[AMDGPU][InstCombine][InstSimplify] Pre-commit tests for PR130742 (#135305)
https://github.com/llvm/llvm-project/pull/130742#discussion_r1993055149
2025-04-11 12:42:14 +08:00
Ulrich Weigand
80267f8148
Support z17 processor name and scheduler description (#135254)
The recently announced IBM z17 processor implements the architecture
already supported as "arch15" in LLVM. This patch adds support for "z17"
as an alternate architecture name for arch15.

This patch also add the scheduler description for the z17 processor,
provided by Jonas Paulsson.
2025-04-11 00:20:58 +02:00
Mircea Trofin
442050ce8f
[ctxprof] Flatten indirect call info in pre-thinlink compilation (#134766)
Same idea as in #134723 - flatten indirect call info in `"VP"` `MD_prof` metadata for the thinlinker, for cases that aren't covered by a contextual profile. If we don't ICP an indirect call target in the specialized module, the call will fall to the copy of that target outside the specialized module. If the graph under that target also has some indirect calls, in the absence of this pass, we'd have a steeper performance regression - because none of those would have a chance to be ICPed.
2025-04-08 17:33:37 -07:00
Mircea Trofin
4c90d977db
[ctxprof] Use the flattened contextual profile pre-thinlink (#134723)
Flatten the profile pre-thinlink so that ThinLTO has something to work with for the parts of the binary that aren't covered by contextual profiles. Post-thinlink, the flattener is re-run and will actually change profile info, but just for the modules containing contextual trees ("specialized modules"). For the rest, the flattener just yanks out the instrumentation.
2025-04-08 17:30:49 -07:00
David Green
c23e1cb936
[BasicAA] Treat ExtractValue(Argument) similar to Argument in relation to function-local objects. (#134716)
This is a much smaller, technically orthogonal patch similar to #134505. It
states that a extractvalue(Argument) can be treated like an Argument for alias
analysis, where the extractelement acts like a phi / copy. No inttoptr here.
2025-04-08 10:05:58 +01:00
David Green
9d82ab8a82 [BasicAA] Add some test cases for coerced function args 2025-04-08 10:02:24 +01:00
Mircea Trofin
f1bb2fe356
[ctxprof] Use isInSpecializedModule as criteria for using contextual profile (#134468)
After #134340, the availability of contextual profile isn't in itself an indication of compiling the module containing all the functions covered by that profile.
2025-04-07 19:55:00 -07:00
Mircea Trofin
6a3e5f89bb
[ctxprof] Only prune the profile in modules containing only context trees (#134340)
We will subsequently treat the whole profile as "flat" in the frontend, (i.e flatten and combine with the flat profile section), so we can have a profile for ThinLTO for parts of the application that don't come under the contextual profile. After ThinLTO, we will treat the module(s) containing contextual trees differently: they'll have only the contextual profile pertinent to them. The rest of the modules (non-contextual) will proceed "as usual", off the flattened profile.

This patch implements pruning of the contextual profile to enable the above.
2025-04-07 19:52:03 -07:00
Ashley Coleman
e3369a8dc9
[NFC][HLSL] Rename ResourceBinding Types (#134165)
Non-functional change as first step in
https://github.com/llvm/wg-hlsl/pull/207

Removes `Binding` from "Resource Instance" types
2025-04-04 16:51:35 -06:00
Luke Lau
79435de8a5
[ConstantFold] Support scalable constant splats in ConstantFoldCastInstruction (#133207)
Previously only fixed vector splats were handled. This adds supports for
scalable vectors too by allowing ConstantExpr splats.

We need to add the extra V->getType()->isVectorTy() check because a
ConstantExpr might be a scalar to vector bitcast.

By allowing ConstantExprs this also allow fixed vector ConstantExprs to
be folded, which causes the diffs in
llvm/test/Analysis/ValueTracking/known-bits-from-operator-constexpr.ll
and llvm/test/Transforms/InstSimplify/ConstProp/cast-vector.ll. I can
remove them from this PR if reviewers would prefer.

Fixes #132922
2025-04-03 16:24:56 +01:00
David Green
c51b24c36a [AArch64] Use getVectorInstrCost in div cost
The costs of ExtractElement and InsertElement should be obtained via
getVectorInstrCost.
2025-04-02 14:51:22 +01:00
Yingwei Zheng
f066d7504e
[Reland][SCEV] teach isImpliedViaOperations about samesign (#133711)
This patch relands https://github.com/llvm/llvm-project/pull/124270.
Closes https://github.com/llvm/llvm-project/issues/126409.

The root cause is that we incorrectly preserve the samesign flag after
truncating operands of an icmp:
https://alive2.llvm.org/ce/z/4NE9gS

---------

Co-authored-by: Ramkumar Ramachandra <ramkumar.ramachandra@codasip.com>
2025-04-02 18:45:33 +08:00
Jeremy Morse
1ebc308bba
[DebugInfo][RemoveDIs] Remove debug-intrinsic printing cmdline options (#131855)
During the transition from debug intrinsics to debug records, we used
several different command line options to customise handling: the
printing of debug records to bitcode and textual could be independent of
how the debug-info was represented inside a module, whether the
autoupgrader ran could be customised. This was all valuable during
development, but now that totally removing debug intrinsics is coming
up, this patch removes those options in favour of a single flag
(experimental-debuginfo-iterators), which enables autoupgrade, in-memory
debug records, and debug record printing to bitcode and textual IR.

We need to do this ahead of removing the
experimental-debuginfo-iterators flag, to reduce the amount of
test-juggling that happens at that time.

There are quite a number of weird test behaviours related to this --
some of which I simply delete in this commit. Things like
print-non-instruction-debug-info.ll , the test suite now checks for
debug records in all tests, and we don't want to check we can print as
intrinsics. Or the update_test_checks tests -- these are duplicated with
write-experimental-debuginfo=false to ensure file writing for intrinsics
is correct, but that's something we're imminently going to delete.

A short survey of curious test changes:
* free-intrinsics.ll: we don't need to test that debug-info is a zero
cost intrinsic, because we won't be using intrinsics in the future.
* undef-dbg-val.ll: apparently we pinned this to non-RemoveDIs in-memory
mode while we sorted something out; it works now either way.
* salvage-cast-debug-info.ll: was testing intrinsics-in-memory get
salvaged, isn't necessary now
* localize-constexpr-debuginfo.ll: was producing "dead metadata"
intrinsics for optimised-out variable values, dbg-records takes the
(correct) representation of poison/undef as an operand. Looks like we
didn't update this in the past to avoid spurious test differences.
* Transforms/Scalarizer/dbginfo.ll: this test was explicitly testing
that debug-info affected codegen, and we deferred updating the tests
until now. This is just one of those silent gnochange issues that get
fixed by RemoveDIs.

Finally: I've added a bitcode test, dbg-intrinsics-autoupgrade.ll.bc,
that checks we can autoupgrade debug intrinsics that are in bitcode into
the new debug records.
2025-04-01 14:27:11 +01:00
David Green
2653eb52d1 [AArch64] Add -cost-kind=all coverage for sve cost tests. NFC 2025-03-31 19:49:39 +01:00
David Green
668edb43a0 [AArch64] Update more costmodel tests with -cost-kind=all. NFC 2025-03-31 19:09:03 +01:00
David Green
9cdab16da9 [AArch64] Remove CODE llc run lines from costmodel tests. NFC
The code is already tested in CodeGen/AArch64 tests such as neon-perm.ll and
the select- tests.
2025-03-31 17:44:39 +01:00
Alexey Bataev
78777a204a
[LV]Split store-load forward distance analysis from other checks, NFC (#121156)
The patch splits the store-load forwarding distance analysis from other
dependency analysis in LAA. Currently it supports only power-of-2
distances, required to support non-power-of-2 distances in future.

Part of #100755
2025-03-31 07:28:44 -04:00
Matt Arsenault
94122d58fc
Lint: Replace -lint-abort-on-error cl::opt with pass parameter (#132933) 2025-03-31 08:42:51 +07:00