673 Commits

Author SHA1 Message Date
Nikita Popov
c23b4fbdbb
[IR] Remove size argument from lifetime intrinsics (#150248)
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.

This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
2025-08-08 11:09:34 +02:00
Paul Walker
ceb2b9c141
[LLVM][DAGCombiner] fold (shl (X * vscale(C0)), C1) -> (X * vscale(C0 << C1)). (#150651) 2025-08-01 11:42:45 +01:00
Nikita Popov
5f531827a4
[LSR] Do not consider uses in lifetime intrinsics (#149492)
We should ignore uses of pointers in lifetime intrinsics, as these are
not actually materialized in the final code, so don't affect register
pressure or anything else LSR needs to model.
    
Handling these only results in peculiar rewrites where additional
intermediate GEPs are introduced.
2025-07-18 16:13:00 +02:00
Shan Huang
089106fdfb
[DebugInfo][LoopStrengthReduce] Salvage the debug value of the dead cmp instruction (#147241)
Fix #147238
2025-07-14 09:45:37 +08:00
Nikita Popov
a9c6f1ed4f Revert "[LSR] Regenerate test checks (NFC)"
This reverts commit a60405c2d51ffe428dd38b8b1020c19189968f76.

Causes buildbot failures.
2025-07-11 14:11:42 +02:00
Nikita Popov
a60405c2d5 [LSR] Regenerate test checks (NFC) 2025-07-11 13:14:02 +02:00
Sergey Shcherbinin
8472eb1361
[AArch64LoadStoreOpt] BaseReg update is searched also in CF successor (#145583)
Look for reg update instruction (to merge w/ mem instruction into
pre/post-increment form) not only inside a single MBB but also along a
CF path going downward w/o side enters such that BaseReg is alive along
it but not at its exits.

Regression test is updated accordingly.
2025-07-11 11:06:11 +01:00
Guy David
76274eb2b3
[PHIElimination] Revert #131837 #146320 #146337 (#146850)
Reverting because mis-compiles:
- https://github.com/llvm/llvm-project/pull/131837
- https://github.com/llvm/llvm-project/pull/146320
- https://github.com/llvm/llvm-project/pull/146337
2025-07-03 07:48:08 -04:00
Guy David
f5c62ee0fa
[PHIElimination] Reuse existing COPY in predecessor basic block (#131837)
The insertion point of COPY isn't always optimal and could eventually
lead to a worse block layout, see the regression test in the first
commit.

This change affects many architectures but the amount of total
instructions in the test cases seems too be slightly lower.
2025-06-29 21:28:42 +03:00
John Brawn
a54712c8ec
[LSR] Make canHoistIVInc allow non-integer types (#143707)
canHoistIVInc was made to only allow integer types to avoid a crash in
isIndexedLoadLegal/isIndexedStoreLegal due to them failing an assertion
in getValueType (or rather in MVT::getVT which gets called from that)
when passed a struct type. Adjusting these functions to pass
AllowUnknown=true to getValueType means we don't get an assertion
failure (MVT::Other is returned which TLI->isIndexedLoadLegal should
then return false for), meaning we can remove this check for integer
type.
2025-06-16 15:23:40 +01:00
Ruiling, Song
0487db1f13
MachineScheduler: Improve instruction clustering (#137784)
The existing way of managing clustered nodes was done through adding
weak edges between the neighbouring cluster nodes, which is a sort of
ordered queue. And this will be later recorded as `NextClusterPred` or
`NextClusterSucc` in `ScheduleDAGMI`.

But actually the instruction may be picked not in the exact order of the
queue. For example, we have a queue of cluster nodes A B C. But during
scheduling, node B might be picked first, then it will be very likely
that we only cluster B and C for Top-Down scheduling (leaving A alone).

Another issue is:
```
   if (!ReorderWhileClustering && SUa->NodeNum > SUb->NodeNum)
      std::swap(SUa, SUb);
   if (!DAG->addEdge(SUb, SDep(SUa, SDep::Cluster)))
```
may break the cluster queue.

For example, we want to cluster nodes (order as in `MemOpRecords`): 1 3
2. 1(SUa) will be pred of 3(SUb) normally. But when it comes to (3, 2),
As 3(SUa) > 2(SUb), we would reorder the two nodes, which makes 2 be
pred of 3. This makes both 1 and 2 become preds of 3, but there is no
edge between 1 and 2. Thus we get a broken cluster chain.

To fix both issues, we introduce an unordered set in the change. This
could help improve clustering in some hard case.

One key reason the change causes so many test check changes is: As the
cluster candidates are not ordered now, the candidates might be picked
in different order from before.

The most affected targets are: AMDGPU, AArch64, RISCV.

For RISCV, it seems to me most are just minor instruction reorder, don't
see obvious regression.

For AArch64, there were some combining of ldr into ldp being affected.
With two cases being regressed and two being improved. This has more
deeper reason that machine scheduler cannot cluster them well both
before and after the change, and the load combine algorithm later is
also not smart enough.

For AMDGPU, some cases have more v_dual instructions used while some are
regressed. It seems less critical. Seems like test `v_vselect_v32bf16`
gets more buffer_load being claused.
2025-06-05 15:28:04 +08:00
Alexander Richardson
ee13638362
[AMDGPU] Remove explicit datalayout from tests where not needed
Since e39f6c1844fab59c638d8059a6cf139adb42279a opt will infer the
correct datalayout when given a triple. Avoid explicitly specifying it
in tests that depend on the AMDGPU target being present to avoid the
string becoming out of sync with the TargetInfo value.
Only tests with REQUIRES: amdgpu-registered-target or a local lit.cfg
were updated to ensure that tests for non-target-specific passes that
happen to use the AMDGPU layout still pass when building with a limited
set of targets.

Reviewed By: shiltian, arsenm

Pull Request: https://github.com/llvm/llvm-project/pull/137921
2025-04-30 10:58:17 -07:00
Ricardo Jesus
34f9ddf1ce
[AArch64][SVE] Fold ADD+CNTB to INCB/DECB (#118280)
Currently, given:
```cpp
uint64_t incb(uint64_t x) {
  return x+svcntb();
}
```
LLVM generates:
```gas
incb:
        addvl   x0, x0, #1
        ret
```
Which is equivalent to:
```gas
incb:
        incb    x0
        ret
```

However, on microarchitectures like the Neoverse V2 and Neoverse V3,
the second form (with INCB) can have significantly better latency and
throughput (according to their SWOG). On the Neoverse V2, for example,
ADDVL has a latency and throughput of 2, whereas some forms of INCB
have a latency of 1 and a throughput of 4. The same applies to DECB.
This patch adds patterns to prefer the cheaper INCB/DECB forms over
ADDVL where applicable.
2025-04-17 08:41:17 +01:00
YunQiang Su
1c7ab39f3d
MIPS: Set EnableLoopTermFold (#133454)
Setting `EnableLoopTermFold` enables `loop-term-fold` pass.
2025-03-29 21:51:59 +08:00
Weaver
af150272cf Revert "MIPS: Set EnableLoopTermFold (#133378)"
This reverts commit 71f43a7c42a37d18be98b0885d62ef76e658f242.

Caused build bot failures:
https://lab.llvm.org/buildbot/#/builders/190/builds/17267
https://lab.llvm.org/buildbot/#/builders/144/builds/21445
https://lab.llvm.org/buildbot/#/builders/46/builds/14343

please consider fixing the test failure in long-array-initialize.ll before
recommitting.
2025-03-28 11:31:56 +00:00
YunQiang Su
71f43a7c42
MIPS: Set EnableLoopTermFold (#133378)
Setting `EnableLoopTermFold` enables `loop-term-fold` pass.
2025-03-28 16:49:45 +08:00
YunQiang Su
6dd14c881c
MIPS: Implements MipsTTIImpl::isLSRCostLess using Insns as first (#133068)
So that LoopStrengthReduce can work for MIPS.
The code is copied from RISC-V.

---------

Co-authored-by: qethu <190734095+qethu@users.noreply.github.com>
2025-03-26 23:01:26 +08:00
Jeremy Morse
792a6f8119
[RemoveDIs] Remove "try-debuginfo-iterators..." test flags (#130298)
These date back to when the non-intrinsic format of variable locations
was still being tested and was behind a compile-time flag, so not all
builds / bots would correctly run them. The solution at the time, to get
at least some test coverage, was to have tests opt-in to non-intrinsic
debug-info if it was built into LLVM.

Nowadays, non-intrinsic format is the default and has been on for more
than a year, there's no need for this flag to exist.

(I've downgraded the flag from "try" to explicitly requesting
non-intrinsic format in some places, so that we can deal with tests that
are explicitly about non-intrinsic format in their own commit).
2025-03-14 15:50:49 +00:00
Ricardo Jesus
15fbdc2b96
[AArch64][SVE] Lower unpredicated loads/stores as LDR/STR. (#127837)
Currently, given:
```cpp
svuint8_t foo(uint8_t *x) {
  return svld1(svptrue_b8(), x);
}
```
We generate:
```gas
foo:
  ptrue   p0.b
  ld1b    { z0.b }, p0/z, [x0]
  ret
```
However, on little-endian and with unaligned memory accesses allowed, we
could instead be using LDR as follows:
```gas
foo:
  ldr     z0, [x0]
  ret
```

The second form avoids the predicate dependency.
Likewise for other types and stores.
2025-02-26 13:56:35 +00:00
Sergei Barannikov
ff9c041d96
[MachineScheduler] Fix physreg dependencies of ExitSU (#123541)
Providing the correct operand index allows addPhysRegDataDeps to compute
the correct latency.

Pull Request: https://github.com/llvm/llvm-project/pull/123541
2025-02-01 20:40:50 +03:00
Nikita Popov
29441e4f5f
[IR] Convert from nocapture to captures(none) (#123181)
This PR removes the old `nocapture` attribute, replacing it with the new
`captures` attribute introduced in #116990. This change is
intended to be essentially NFC, replacing existing uses of `nocapture`
with `captures(none)` without adding any new analysis capabilities.
Making use of non-`none` values is left for a followup.

Some notes:
* `nocapture` will be upgraded to `captures(none)` by the bitcode
   reader.
* `nocapture` will also be upgraded by the textual IR reader. This is to
   make it easier to use old IR files and somewhat reduce the test churn in
   this PR.
* Helper APIs like `doesNotCapture()` will check for `captures(none)`.
* MLIR import will convert `captures(none)` into an `llvm.nocapture`
   attribute. The representation in the LLVM IR dialect should be updated
   separately.
2025-01-29 16:56:47 +01:00
Alex MacLean
4583f6d344
[NVPTX] Switch front-ends and tests to ptx_kernel cc (#120806)
the `ptx_kernel` calling convention is a more idiomatic and standard way
of specifying a NVPTX kernel than using the metadata which is not
supposed to change the meaning of the program. Further, checking the
calling convention is significantly faster than traversing the metadata,
improving compile time.

This change updates the clang and mlir frontends as well as the
NVPTXCtorDtorLowering pass to emit kernels using the calling convention.
In addition, this updates all NVPTX unit tests to use the calling
convention as well.
2025-01-07 18:24:50 -08:00
Fangrui Song
9efa7d7af3
Remove -print-lsr-output in favor of --stop-after=loop-reduce
Pull Request: https://github.com/llvm/llvm-project/pull/121305
2024-12-29 18:58:30 -08:00
Lee Wei
abb9f9fa06
[llvm] Remove br i1 undef from some regression tests [NFC] (#117112)
This PR removes tests with `br i1 undef` under
`llvm/tests/Transforms/Loop*, Lower*`.
2024-11-21 08:06:56 +00:00
Nikita Popov
6dc23b7009
[SCEVExpander] Don't try to reuse SCEVUnknown values (#115141)
The expansion of a SCEVUnknown is trivial (it's just the wrapped value).
If we try to reuse an existing value it might be a more complex
expression that simplifies to the SCEVUnknown.

This is inspired by https://github.com/llvm/llvm-project/issues/114879,
because SCEVExpander replacing a constant with a phi node is just silly.
(I don't consider this a fix for that issue though.)
2024-11-11 12:36:29 +01:00
Yingwei Zheng
0b9f1cc024
[SCEV] Disallow simplifying phi(undef, X) to X (#115109)
See the following case:
```
@GlobIntONE = global i32 0, align 4

define ptr @src() {
entry:
  br label %for.body.peel.begin

for.body.peel.begin:                              ; preds = %entry
  br label %for.body.peel

for.body.peel:                                    ; preds = %for.body.peel.begin
  br i1 true, label %cleanup.peel, label %cleanup.loopexit.peel

cleanup.loopexit.peel:                            ; preds = %for.body.peel
  br label %cleanup.peel

cleanup.peel:                                     ; preds = %cleanup.loopexit.peel, %for.body.peel
  %retval.2.peel = phi ptr [ undef, %for.body.peel ], [ @GlobIntONE, %cleanup.loopexit.peel ]
  br i1 true, label %for.body.peel.next, label %cleanup7

for.body.peel.next:                               ; preds = %cleanup.peel
  br label %for.body.peel.next1

for.body.peel.next1:                              ; preds = %for.body.peel.next
  br label %entry.peel.newph

entry.peel.newph:                                 ; preds = %for.body.peel.next1
  br label %for.body

for.body:                                         ; preds = %cleanup, %entry.peel.newph
  %retval.0 = phi ptr [ %retval.2.peel, %entry.peel.newph ], [ %retval.2, %cleanup ]
  br i1 false, label %cleanup, label %cleanup.loopexit

cleanup.loopexit:                                 ; preds = %for.body
  br label %cleanup

cleanup:                                          ; preds = %cleanup.loopexit, %for.body
  %retval.2 = phi ptr [ %retval.0, %for.body ], [ @GlobIntONE, %cleanup.loopexit ]
  br i1 false, label %for.body, label %cleanup7.loopexit

cleanup7.loopexit:                                ; preds = %cleanup
  %retval.2.lcssa.ph = phi ptr [ %retval.2, %cleanup ]
  br label %cleanup7

cleanup7:                                         ; preds = %cleanup7.loopexit, %cleanup.peel
  %retval.2.lcssa = phi ptr [ %retval.2.peel, %cleanup.peel ], [ %retval.2.lcssa.ph, %cleanup7.loopexit ]
  ret ptr %retval.2.lcssa
}

define ptr @tgt() {
entry:
  br label %for.body.peel.begin

for.body.peel.begin:                              ; preds = %entry
  br label %for.body.peel

for.body.peel:                                    ; preds = %for.body.peel.begin
  br i1 true, label %cleanup.peel, label %cleanup.loopexit.peel

cleanup.loopexit.peel:                            ; preds = %for.body.peel
  br label %cleanup.peel

cleanup.peel:                                     ; preds = %cleanup.loopexit.peel, %for.body.peel
  %retval.2.peel = phi ptr [ undef, %for.body.peel ], [ @GlobIntONE, %cleanup.loopexit.peel ]
  br i1 true, label %for.body.peel.next, label %cleanup7

for.body.peel.next:                               ; preds = %cleanup.peel
  br label %for.body.peel.next1

for.body.peel.next1:                              ; preds = %for.body.peel.next
  br label %entry.peel.newph

entry.peel.newph:                                 ; preds = %for.body.peel.next1
  br label %for.body

for.body:                                         ; preds = %cleanup, %entry.peel.newph
  br i1 false, label %cleanup, label %cleanup.loopexit

cleanup.loopexit:                                 ; preds = %for.body
  br label %cleanup

cleanup:                                          ; preds = %cleanup.loopexit, %for.body
  br i1 false, label %for.body, label %cleanup7.loopexit

cleanup7.loopexit:                                ; preds = %cleanup
  %retval.2.lcssa.ph = phi ptr [ %retval.2.peel, %cleanup ]
  br label %cleanup7

cleanup7:                                         ; preds = %cleanup7.loopexit, %cleanup.peel
  %retval.2.lcssa = phi ptr [ %retval.2.peel, %cleanup.peel ], [ %retval.2.lcssa.ph, %cleanup7.loopexit ]
  ret ptr %retval.2.lcssa
}
```
1. `simplifyInstruction(%retval.2.peel)` returns `@GlobIntONE`. Thus,
`ScalarEvolution::createNodeForPHI` returns SCEV expr `@GlobIntONE` for
`%retval.2.peel`.
2. `SimplifyIndvar::replaceIVUserWithLoopInvariant` tries to replace the
use of `%retval.2.peel` in `%retval.2.lcssa.ph` with `@GlobIntONE`.
3. `simplifyLoopAfterUnroll -> simplifyLoopIVs -> SCEVExpander::expand`
reuses `%retval.2.peel = phi ptr [ undef, %for.body.peel ], [
@GlobIntONE, %cleanup.loopexit.peel ]` to generate code for
`@GlobIntONE`. It is incorrect.

This patch disallows simplifying `phi(undef, X)` to `X` by setting
`CanUseUndef` to false.
Closes https://github.com/llvm/llvm-project/issues/114879.
2024-11-07 15:53:51 +08:00
Youngsuk Kim
caa32e6d6f
[llvm][LSR] Fix where invariant on ScaledReg & Scale is violated (#112576)
Comments attached to the `ScaledReg` field of `struct Formula` explains
that, `ScaledReg` must be non-null when `Scale` is non-zero.

This fixes up a code path where this invariant is violated. Also, add an
assert to ensure this invariant holds true.

Without this patch, compiler aborts with the attached test case.

Fixes #76504
2024-10-17 10:47:44 -04:00
Orlando Cazalet-Hyams
7506872afc
[DebugInfo][LSR] Fix assertion failure salvaging IV with offset > 64 bits wide (#110979)
Fixes #110494
2024-10-03 11:47:08 +01:00
Jeremy Morse
e6bf48d110
[X86] Don't request 0x90 nop filling in p2align directives (#110134)
As of rev ea222be0d, LLVMs assembler will actually try to honour the
"fill value" part of p2align directives. X86 printed these as 0x90, which
isn't actually what it wanted: we want multi-byte nops for .text
padding. Compiling via a textual assembly file produces single-byte
nop padding since ea222be0d but the built-in assembler will produce
multi-byte nops. This divergent behaviour is undesirable.

To fix: don't set the byte padding field for x86, which allows the
assembler to pick multi-byte nops. Test that we get the same multi-byte
padding when compiled via textual assembly or directly to object file.
Added same-align-bytes-with-llasm-llobj.ll to that effect, updated
numerous other tests to not contain check-lines for the explicit padding.
2024-10-02 11:14:05 +01:00
Nikita Popov
9f3d1695eb
[SCEVExpander] Preserve gep nuw during expansion (#102133)
When expanding SCEV adds to geps, transfer the nuw flag to the resulting
gep. (Note that this doesn't apply to IV increment GEPs, which go
through a different code path.)
2024-10-02 11:45:00 +02:00
Nikita Popov
4b3ba64ba7
[SCEVExpander] Clear flags when reusing GEP (#109293)
As pointed out in the review of #102133, SCEVExpander currently
incorrectly reuses GEP instructions that have poison-generating flags
set. Fix this by clearing the flags on the reused instruction.
2024-10-01 14:22:54 +02:00
Nikita Popov
c4744e49f6 [LSR] Regenerate test checks (NFC) 2024-09-19 16:34:10 +02:00
Sergey Kachkov
1f2a634c44 Reland "[LSR] Do not create duplicated PHI nodes while preserving LCSSA form" (#107380)
Motivating example: https://godbolt.org/z/eb97zrxhx
Here we have 2 induction variables in the loop: one is corresponding to
i variable (add rdx, 4), the other - to res (add rax, 2). The second
induction variable can be removed by rewriteLoopExitValues() method
(final value of res at loop exit is unroll_iter * -2); however, this
doesn't happen because we have duplicated LCSSA phi nodes at loop exit:
```
; Preheader:
for.body.preheader.new:                           ; preds = %for.body.preheader
  %unroll_iter = and i64 %N, -4
  br label %for.body

; Loop:
for.body:                                         ; preds = %for.body, %for.body.preheader.new
  %lsr.iv = phi i64 [ %lsr.iv.next, %for.body ], [ 0, %for.body.preheader.new ]
  %i.07 = phi i64 [ 0, %for.body.preheader.new ], [ %inc.3, %for.body ]
  %inc.3 = add nuw i64 %i.07, 4
  %lsr.iv.next = add nsw i64 %lsr.iv, -2
  %niter.ncmp.3.not = icmp eq i64 %unroll_iter, %inc.3
  br i1 %niter.ncmp.3.not, label %for.end.loopexit.unr-lcssa.loopexit, label %for.body, !llvm.loop !7

; Exit blocks
for.end.loopexit.unr-lcssa.loopexit:              ; preds = %for.body
  %inc.3.lcssa = phi i64 [ %inc.3, %for.body ]
  %lsr.iv.next.lcssa11 = phi i64 [ %lsr.iv.next, %for.body ]
  %lsr.iv.next.lcssa = phi i64 [ %lsr.iv.next, %for.body ]
  br label %for.end.loopexit.unr-lcssa
```
rewriteLoopExitValues requires %lsr.iv.next value to have only 2 uses:
one in LCSSA phi node, the other - in induction phi node. Here we have 3
uses of this value because of duplicated lcssa nodes, so the transform
doesn't apply and leads to an extra add operation inside the loop. The
proposed solution is to accumulate inserted instructions that will
require LCSSA form update into SetVector and then call
formLCSSAForInstructions for this SetVector once, so the same
instructions don't process twice.

Reland fixes the issue with preserve-lcssa.ll test: it fails in the situation
when x86_64-unknown-linux-gnu target is unavailable in opt. The changes are
moved into separate duplicated-phis.ll test with explicit x86 target requirement
to fix bots which are not building this target.
2024-09-09 16:14:51 +03:00
Sergey Kachkov
17f0c5dfaa [LSR][NFC] Add pre-commit test 2024-09-09 16:14:50 +03:00
dyung
2bf551e600
Revert "[LSR] Do not create duplicated PHI nodes while preserving LCSSA form" (#107666)
Reverts llvm/llvm-project#107380

Change is causing the test preserve-lcssa.ll to fail on at least 2 build
bots:
- https://lab.llvm.org/buildbot/#/builders/190/builds/5231
- https://lab.llvm.org/buildbot/#/builders/161/builds/1855
2024-09-06 19:54:26 -07:00
Sergey Kachkov
2cb4d1b1bd
[LSR] Do not create duplicated PHI nodes while preserving LCSSA form (#107380)
Motivating example: https://godbolt.org/z/eb97zrxhx
Here we have 2 induction variables in the loop: one is corresponding to
i variable (add rdx, 4), the other - to res (add rax, 2). The second
induction variable can be removed by rewriteLoopExitValues() method
(final value of res at loop exit is unroll_iter * -2); however, this
doesn't happen because we have duplicated LCSSA phi nodes at loop exit:
```
; Preheader:
for.body.preheader.new:                           ; preds = %for.body.preheader
  %unroll_iter = and i64 %N, -4
  br label %for.body

; Loop:
for.body:                                         ; preds = %for.body, %for.body.preheader.new
  %lsr.iv = phi i64 [ %lsr.iv.next, %for.body ], [ 0, %for.body.preheader.new ]
  %i.07 = phi i64 [ 0, %for.body.preheader.new ], [ %inc.3, %for.body ]
  %inc.3 = add nuw i64 %i.07, 4
  %lsr.iv.next = add nsw i64 %lsr.iv, -2
  %niter.ncmp.3.not = icmp eq i64 %unroll_iter, %inc.3
  br i1 %niter.ncmp.3.not, label %for.end.loopexit.unr-lcssa.loopexit, label %for.body, !llvm.loop !7

; Exit blocks
for.end.loopexit.unr-lcssa.loopexit:              ; preds = %for.body
  %inc.3.lcssa = phi i64 [ %inc.3, %for.body ]
  %lsr.iv.next.lcssa11 = phi i64 [ %lsr.iv.next, %for.body ]
  %lsr.iv.next.lcssa = phi i64 [ %lsr.iv.next, %for.body ]
  br label %for.end.loopexit.unr-lcssa
```
rewriteLoopExitValues requires %lsr.iv.next value to have only 2 uses:
one in LCSSA phi node, the other - in induction phi node. Here we have 3
uses of this value because of duplicated lcssa nodes, so the transform
doesn't apply and leads to an extra add operation inside the loop. The
proposed solution is to accumulate inserted instructions that will
require LCSSA form update into SetVector and then call
formLCSSAForInstructions for this SetVector once, so the same
instructions don't process twice.
2024-09-06 18:39:47 +03:00
Philip Reames
27a62ec72a
[LSR] Split the -lsr-term-fold transformation into it's own pass (#104234)
This transformation doesn't actually use any of the internal state of
LSR and recomputes all information from SCEV.  Splitting it out makes
it easier to test.
    
Note that long term I would like to write a version of this transform
which *is* integrated with LSR's solver, but if that happens, we'll
just delete the extra pass.
    
Integration wise, I switched from using TTI to using a pass configuration
variable.  This seems slightly more idiomatic, and means we don't run
the extra logic on any target other than RISCV.
2024-08-17 18:34:23 -07:00
Benjamin Maxwell
7fad04e94b
[LSR] Fix matching vscale immediates (#100080)
Somewhat confusingly a `SCEVMulExpr` is a `SCEVNAryExpr`, so can have
> 2 operands. Previously, the vscale immediate matching did not check
the number of operands of the `SCEVMulExpr`, so would ignore any
operands after the first two.

This led to incorrect codegen (and results) for ArmSME in IREE
(https://github.com/iree-org/iree), which sometimes addresses things
that are a `vscale * vscale` multiple away. The test added with this
change shows an example reduced from IREE. The second write should
be offset from the first `16 * vscale * vscale` (* 4 bytes), however,
previously LSR dropped the second vscale and instead offset the write by
`#4, mul vl`, which is an offset of `16 * vscale` (* 4 bytes).
2024-07-24 10:06:34 +01:00
Benjamin Maxwell
c1b70fa5bf Precommit vscale-fixups.ll test (NFC)
Precommit test for #100080.
2024-07-23 10:03:14 +00:00
Luke Lau
557ef043af
[RISCV] Copy AVLs whose LiveIntervals aren't extendable in insertVSETVLI (#98342)
Currently before forwarding an AVL we do a simple non-exhaustive check
to see if its LiveInterval is extendable. But we also need to check for
this when we're extending an AVL's LiveInterval via merging the
VSETVLIInfos in transferBefore with equally zero AVLs.

Rather than trying to conservatively prevent these cases, this inserts a
copy of the AVL instead if we don't know we'll be able to extend it.
This is likely to be more robust, and even if the extra copy is
undesirable these cases should be rare in practice.
2024-07-15 13:18:12 +08:00
Shan Huang
d83d09facd
[DebugInfo][LoopStrengthReduce] Fix missing debug location updates (#97519)
Fix #97510 .

Note that, for the new phi instruction `NewPH`, which replaces the old
phi `PH` and the cast `ShadowUse`, I choose to propagate the debug
location of `PH` to it, because the cast is eliminated according to the
optimization semantics.
2024-07-15 09:44:18 +08:00
Alex Bradbury
f8dbe1d09d
Revert "[RISCV] Enable TTI::shouldDropLSRSolutionIfLessProfitable by default" (#98328)
Reverts llvm/llvm-project#89927 while we investigate performance
regressions reported by @dtcxzyw
2024-07-10 15:33:20 +01:00
Alex Bradbury
af47a4ec50
[RISCV] Enable TTI::shouldDropLSRSolutionIfLessProfitable by default (#89927)
This avoids some cases where LSR produces results that lead to very poor
codegen. There's a chance we'll see minor degradations for some inputs
in the case that our metrics say the found solution is worse, but in
reality it's better than the starting point.

Per the review thread, at least one vendor has been enabling this by
defualt for some time and found overall it's an improvement. As such,
we'll enable by default and aim to fix any as-yet-unknown regressions
in-tree.
2024-07-10 13:23:31 +01:00
Luke Lau
db782b44b3
[RISCV] Don't forward AVL in VSETVLIInfo if it would clobber other definitions (#97264)
This fixes a crash found when compiling OpenBLAS with -mllvm
-verify-machineinstrs.

When we "forward" the AVL from the output of a vsetvli, we might have to
extend the LiveInterval of the AVL to where insert the new vsetvli.

Most of the time we are able to extend the LiveInterval because there's
only one val num (definition) for the register. But PHI elimination can
assign multiple values to the same register, in which case we end up
clobbering a different val num when extending:

    %x = PseudoVSETVLI %avl, ...
    %avl = ADDI ...
    %v = PseudoVADD ..., avl=%x
    ; %avl is forwarded to PseudoVADD:
    %x = PseudoVSETVLI %avl, ...
    %avl = ADDI ...
    %v = PseudoVADD ..., avl=%avl

Here there's no way to extend the %avl from the vsetvli since %avl is
redefined, i.e. we have two val nums.

This fixes it by only forwarding it when we have exactly one val num,
where it should be safe to extend it.
2024-07-05 11:44:59 +08:00
Graham Hunter
4311b14e9c
[LSR] Recognize vscale-relative immediates (#88124)
Extends LoopStrengthReduce to recognize immediates multiplied by vscale, and query the current target for whether they are legal offsets for memory operations or adds.
2024-07-01 09:23:31 +01:00
Luke Lau
eb76bc38ff
[RISCV] Relax RISCVInsertVSETVLI output VL peeking to cover registers (#96200)
If the AVL in a VSETVLIInfo is the output VL of a vsetvli with the same
VLMAX, we treat it as the AVL of said vsetvli.

This allows us to remove a true dependency as well as treating
VSETVLIInfos as equal in more places and avoid toggles.

We do this in two places, needVSETVLI and computeInfoForInstr. However
we don't do this in computeInfoForInstr's vsetvli equivalent,
getInfoForVSETVLI.

We also have a restriction only in computeInfoForInstr that the AVL
can't be a register as we want to avoid extending live ranges.

This patch does two interlinked things:

1) It adds this AVL "peeking" to getInfoForVSETVLI

2) It relaxes the constraint that the AVL can't be a register in
computeInfoForInstr, since it removes a use of the output VL which can
actually reduce register pressure. E.g. see the diff in
@vector_init_vsetvli_N and @test6

Now that getInfoForVSETVLI and computeInfoForInstr are consistent, we
can remove the check in needVSETVLI.

We also need to update how we update LiveIntervals in insertVSETVLI, as
we can now end up needing to extend the LiveRange of the AVL across
blocks.
2024-06-23 20:20:59 +08:00
Nikita Popov
76d3ab2cc3
[IR] Remove support for shl constant expressions (#96037)
Remove support for shl constant expressions, as part of:
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179
2024-06-20 08:59:29 +02:00
Philip Reames
cb76896d6e
[SCEVExpander] Recognize urem idiom during expansion (#96005)
If we have a urem expression, emitting it as a urem is significantly
better that letting the fully expansion kick in. We have the risk of a
udiv or mul which could have previously been shared, but loosing that
seems like a reasonable tradeoff for being able to round trip a urem w/o
modification.
2024-06-19 08:40:04 -07:00
Stephen Tozer
094572701d
[RemoveDIs] Print IR with debug records by default (#91724)
This patch makes the final major change of the RemoveDIs project, changing the
default IR output from debug intrinsics to debug records. This is expected to
break a large number of tests: every single one that tests for uses or
declarations of debug intrinsics and does not explicitly disable writing
records. 

If this patch has broken your downstream tests (or upstream tests on a
configuration I wasn't able to run):
1. If you need to immediately unblock a build, pass
`--write-experimental-debuginfo=false` to LLVM's option processing for all
failing tests (remember to use `-mllvm` for clang/flang to forward arguments to
LLVM).
2. For most test failures, the changes are trivial and mechanical, enough that
they can be done by script; see the migration guide for a guide on how to do
this: https://llvm.org/docs/RemoveDIsDebugInfo.html#test-updates
3. If any tests fail for reasons other than FileCheck check lines that need
updating, such as assertion failures, that is most likely a real bug with this
patch and should be reported as such.

For more information, see the recent PSA:
https://discourse.llvm.org/t/psa-ir-output-changing-from-debug-intrinsics-to-debug-records/79578
2024-06-14 15:07:27 +01:00
Graham Hunter
e16f2f5d24
[AArch64] Override isLSRCostLess, take number of instructions into account (#84189)
Adds an AArch64-specific version of isLSRCostLess, changing the relative
importance of the various terms from the formulae being evaluated.

This has been split out from my vscale-aware LSR work, see the RFC for
reference:
https://discourse.llvm.org/t/rfc-vscale-aware-loopstrengthreduce/77131
2024-06-06 14:45:36 +01:00