689 Commits

Author SHA1 Message Date
Florian Hahn
e8908215de
[LSR] Support SCEVPtrToAddr in SCEVDbgValueBuilder.
Allow SCEVPtrToAddr as cast in assertion in SCEVDbgValueBuilder.
SCEVPtrToAddr is handled similarly to SCEVPtrToInt.

Fixes a crash with debug info after bd40d1de9c9ee, which started to
generate ptrtoaddr instead of ptrtoint expressions.
2026-02-07 14:02:45 +00:00
Nikita Popov
0809a9927d [LSR] Add another test case for implicit truncation (NFC)
From https://github.com/llvm/llvm-project/pull/171456#issuecomment-3675488354.
2025-12-19 16:34:17 +01:00
Nikita Popov
8c93254dfc [LSR] Add test for implicit truncation on icmp immediate (NFC)
Test case for:
https://github.com/llvm/llvm-project/pull/171456#issuecomment-3670755137
2025-12-19 10:07:50 +01:00
John Brawn
ccd4e7b1ed
[LSR] Make OptimizeLoopTermCond able to handle some non-cmp conditions (#165590)
Currently OptimizeLoopTermCond can only convert a cmp instruction to
using a postincrement induction variable, which means it can't handle
predicated loops where the termination condition comes from
get_active_lane_mask. Relax this restriction so that we can handle any
kind of instruction, though only if it's the instruction immediately
before the branch (except for possibly an extractelement).
2025-12-03 15:28:46 +00:00
John Brawn
2ad71745cd
[LSR] Insert the transformed IV increment in the user block (#169515)
Currently we try to hoist the transformed IV increment instruction to
the header block to help with generation of postincrement instructions,
but this only works if the user instruction is also in the header. We
should instead be trying to insert it in the same block as the user.
2025-12-02 17:15:00 +00:00
Matt Arsenault
793ab6a80d
X86: Enable terminal rule (#165957) 2025-11-10 12:34:00 -08:00
Sander de Smalen
7fe60a7685
[AArch64][SVE] Avoid movprfx by reusing register for _UNDEF pseudos. (#166926)
For predicated SVE instructions where we know that the inactive lanes
are undef, it is better to pick a destination register that is not
unique. This avoids introducing a movprfx to copy a unique register to
the destination operand, which would be needed to comply with the tied
operand constraints.

For example:
```
  %src1 = COPY $z1
  %src2 = COPY $z2
  %dst = SDIV_ZPZZ_S_UNDEF %p, %src1, %src2
```
Here it is beneficial to pick $z1 or $z2 as the destination register,
because if it would have chosen a unique register (e.g. $z0) then the
pseudo expand pass would need to insert a MOVPRFX to expand the
operation into:
```
  $z0 = SDIV_ZPZZ_S_UNDEF $p0, $z1, $z2

  ->

  $z0 = MOVPRFX $z1
  $z0 = SDIV_ZPmZ_S $p0, $z0, $z2
```
By picking $z1 directly, we'd get:
```
  $z1 = SDIV_ZPmZ_S, $p0 $z1, $z2
```
2025-11-10 16:33:59 +00:00
John Brawn
53e7443e0c
[LSR] Don't count conditional loads/store as enabling pre/post-index (#159573)
When a load/store is conditionally executed in a loop it isn't a
candidate for pre/post-index addressing, as the increment of the address
would only happen on those loop iterations where the load/store is
executed.

Detect this and only discount the AddRec cost when the load/store is
unconditional.
2025-10-30 13:53:15 +00:00
paperchalice
249883d0c5
[test][Transforms] Remove unsafe-fp-math uses part 2 (NFC) (#164786)
Post cleanup for #164534.
2025-10-23 20:31:31 +08:00
Florian Hahn
f78150d2d4
Reapply "[SCEV] Fold (C1 * A /u C2) -> A /u (C2 /u C1), if C2 > C1." (#158328)
This reverts commit fd58f235f8c5bd40d98acfd8e7fb11d41de301c7.

The recommit contains an extra check to make sure that D is a multiple of
C2, if C2 > C1. This fixes the issue causing the revert fd58f235f8c. Tests
have been added in 6a726e9a4d3d0.

Original message:
If C2 >u C1 and C1 >u 1, fold to A /u (C2 /u C1).

Depends on https://github.com/llvm/llvm-project/pull/157555.

Alive2 Proof: https://alive2.llvm.org/ce/z/BWvQYN

PR: https://github.com/llvm/llvm-project/pull/157656
2025-09-17 14:28:30 +01:00
Florian Hahn
6a726e9a4d
[SCEV] Add more tests for MUL/UDIV folds.
Add additional test coverage for
https://github.com/llvm/llvm-project/pull/157656 covering cases where
the dividend is not a multiple of the divisor, causing the revert of
70012fda631.
2025-09-17 13:05:49 +01:00
John Brawn
8fab81121e
[LSR] Add an addressing mode that considers all addressing modes (#158110)
The way that loops strength reduction works is that the target has to
upfront decide whether it wants its addressing to be preindex,
postindex, or neither. This choice affects:
 * Which potential solutions we generate
* Whether we consider a pre/post index load/store as costing an AddRec
or not.

None of these choices are a good fit for either AArch64 or ARM, where
both preindex and postindex addressing are typically free:
* If we pick None then we count pre/post index addressing as costing one
addrec more than is correct so we don't pick them when we should.
* If we pick PreIndexed or PostIndexed then we get the correct cost for
that addressing type, but still get it wrong for the other and also
exclude potential solutions using offset addressing that could have less
cost.

This patch adds an "all" addressing mode that causes all potential
solutions to be generated and counts both pre and postindex as having
AddRecCost of zero. Unfortuntely this reveals problems elsewhere in how
we calculate the cost of things that need to be fixed before we can make
use of it.
2025-09-16 11:46:54 +01:00
Reid Kleckner
fd58f235f8
Revert "[SCEV] Fold (C1 * A /u C2) -> A /u (C2 /u C1), if C2 > C1." (#158328)
Reverts llvm/llvm-project#157656

There are multiple reports that this is causing miscompiles in the MSan
test suite after bootstrapping and that this is causing miscompiles in
rustc. Let's revert for now, and work to capture a reproducer next week.
2025-09-12 10:15:41 -07:00
Florian Hahn
70012fda63
[SCEV] Fold (C1 * A /u C2) -> A /u (C2 /u C1), if C2 > C1. (#157656)
If C2 >u C1 and C1 >u 1, fold to A /u (C2 /u C1).

Depends on https://github.com/llvm/llvm-project/pull/157555.

Alive2 Proof: https://alive2.llvm.org/ce/z/BWvQYN

PR: https://github.com/llvm/llvm-project/pull/157656
2025-09-11 08:08:47 +00:00
Florian Hahn
74ec38fad0
[SCEV] Fold (C * A /u C) -> A, if A is a multiple of C and C a pow-of-2. (#156730)
Alive2 Proof: https://alive2.llvm.org/ce/z/JoHJE9

PR: https://github.com/llvm/llvm-project/pull/156730
2025-09-05 08:45:13 +00:00
Florian Hahn
7c10e57f54
[LSR] Move test from Analysis/ScalarEvolution to Transforms, regen.
Move transform test to llvm/test/Transforms/LoopStrengthReduce/X86,
clean up the names a bit and regenerate check lines.
2025-09-03 19:44:45 +01:00
Nikita Popov
c23b4fbdbb
[IR] Remove size argument from lifetime intrinsics (#150248)
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.

This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
2025-08-08 11:09:34 +02:00
Paul Walker
ceb2b9c141
[LLVM][DAGCombiner] fold (shl (X * vscale(C0)), C1) -> (X * vscale(C0 << C1)). (#150651) 2025-08-01 11:42:45 +01:00
Nikita Popov
5f531827a4
[LSR] Do not consider uses in lifetime intrinsics (#149492)
We should ignore uses of pointers in lifetime intrinsics, as these are
not actually materialized in the final code, so don't affect register
pressure or anything else LSR needs to model.
    
Handling these only results in peculiar rewrites where additional
intermediate GEPs are introduced.
2025-07-18 16:13:00 +02:00
Shan Huang
089106fdfb
[DebugInfo][LoopStrengthReduce] Salvage the debug value of the dead cmp instruction (#147241)
Fix #147238
2025-07-14 09:45:37 +08:00
Nikita Popov
a9c6f1ed4f Revert "[LSR] Regenerate test checks (NFC)"
This reverts commit a60405c2d51ffe428dd38b8b1020c19189968f76.

Causes buildbot failures.
2025-07-11 14:11:42 +02:00
Nikita Popov
a60405c2d5 [LSR] Regenerate test checks (NFC) 2025-07-11 13:14:02 +02:00
Sergey Shcherbinin
8472eb1361
[AArch64LoadStoreOpt] BaseReg update is searched also in CF successor (#145583)
Look for reg update instruction (to merge w/ mem instruction into
pre/post-increment form) not only inside a single MBB but also along a
CF path going downward w/o side enters such that BaseReg is alive along
it but not at its exits.

Regression test is updated accordingly.
2025-07-11 11:06:11 +01:00
Guy David
76274eb2b3
[PHIElimination] Revert #131837 #146320 #146337 (#146850)
Reverting because mis-compiles:
- https://github.com/llvm/llvm-project/pull/131837
- https://github.com/llvm/llvm-project/pull/146320
- https://github.com/llvm/llvm-project/pull/146337
2025-07-03 07:48:08 -04:00
Guy David
f5c62ee0fa
[PHIElimination] Reuse existing COPY in predecessor basic block (#131837)
The insertion point of COPY isn't always optimal and could eventually
lead to a worse block layout, see the regression test in the first
commit.

This change affects many architectures but the amount of total
instructions in the test cases seems too be slightly lower.
2025-06-29 21:28:42 +03:00
John Brawn
a54712c8ec
[LSR] Make canHoistIVInc allow non-integer types (#143707)
canHoistIVInc was made to only allow integer types to avoid a crash in
isIndexedLoadLegal/isIndexedStoreLegal due to them failing an assertion
in getValueType (or rather in MVT::getVT which gets called from that)
when passed a struct type. Adjusting these functions to pass
AllowUnknown=true to getValueType means we don't get an assertion
failure (MVT::Other is returned which TLI->isIndexedLoadLegal should
then return false for), meaning we can remove this check for integer
type.
2025-06-16 15:23:40 +01:00
Ruiling, Song
0487db1f13
MachineScheduler: Improve instruction clustering (#137784)
The existing way of managing clustered nodes was done through adding
weak edges between the neighbouring cluster nodes, which is a sort of
ordered queue. And this will be later recorded as `NextClusterPred` or
`NextClusterSucc` in `ScheduleDAGMI`.

But actually the instruction may be picked not in the exact order of the
queue. For example, we have a queue of cluster nodes A B C. But during
scheduling, node B might be picked first, then it will be very likely
that we only cluster B and C for Top-Down scheduling (leaving A alone).

Another issue is:
```
   if (!ReorderWhileClustering && SUa->NodeNum > SUb->NodeNum)
      std::swap(SUa, SUb);
   if (!DAG->addEdge(SUb, SDep(SUa, SDep::Cluster)))
```
may break the cluster queue.

For example, we want to cluster nodes (order as in `MemOpRecords`): 1 3
2. 1(SUa) will be pred of 3(SUb) normally. But when it comes to (3, 2),
As 3(SUa) > 2(SUb), we would reorder the two nodes, which makes 2 be
pred of 3. This makes both 1 and 2 become preds of 3, but there is no
edge between 1 and 2. Thus we get a broken cluster chain.

To fix both issues, we introduce an unordered set in the change. This
could help improve clustering in some hard case.

One key reason the change causes so many test check changes is: As the
cluster candidates are not ordered now, the candidates might be picked
in different order from before.

The most affected targets are: AMDGPU, AArch64, RISCV.

For RISCV, it seems to me most are just minor instruction reorder, don't
see obvious regression.

For AArch64, there were some combining of ldr into ldp being affected.
With two cases being regressed and two being improved. This has more
deeper reason that machine scheduler cannot cluster them well both
before and after the change, and the load combine algorithm later is
also not smart enough.

For AMDGPU, some cases have more v_dual instructions used while some are
regressed. It seems less critical. Seems like test `v_vselect_v32bf16`
gets more buffer_load being claused.
2025-06-05 15:28:04 +08:00
Alexander Richardson
ee13638362
[AMDGPU] Remove explicit datalayout from tests where not needed
Since e39f6c1844fab59c638d8059a6cf139adb42279a opt will infer the
correct datalayout when given a triple. Avoid explicitly specifying it
in tests that depend on the AMDGPU target being present to avoid the
string becoming out of sync with the TargetInfo value.
Only tests with REQUIRES: amdgpu-registered-target or a local lit.cfg
were updated to ensure that tests for non-target-specific passes that
happen to use the AMDGPU layout still pass when building with a limited
set of targets.

Reviewed By: shiltian, arsenm

Pull Request: https://github.com/llvm/llvm-project/pull/137921
2025-04-30 10:58:17 -07:00
Ricardo Jesus
34f9ddf1ce
[AArch64][SVE] Fold ADD+CNTB to INCB/DECB (#118280)
Currently, given:
```cpp
uint64_t incb(uint64_t x) {
  return x+svcntb();
}
```
LLVM generates:
```gas
incb:
        addvl   x0, x0, #1
        ret
```
Which is equivalent to:
```gas
incb:
        incb    x0
        ret
```

However, on microarchitectures like the Neoverse V2 and Neoverse V3,
the second form (with INCB) can have significantly better latency and
throughput (according to their SWOG). On the Neoverse V2, for example,
ADDVL has a latency and throughput of 2, whereas some forms of INCB
have a latency of 1 and a throughput of 4. The same applies to DECB.
This patch adds patterns to prefer the cheaper INCB/DECB forms over
ADDVL where applicable.
2025-04-17 08:41:17 +01:00
YunQiang Su
1c7ab39f3d
MIPS: Set EnableLoopTermFold (#133454)
Setting `EnableLoopTermFold` enables `loop-term-fold` pass.
2025-03-29 21:51:59 +08:00
Weaver
af150272cf Revert "MIPS: Set EnableLoopTermFold (#133378)"
This reverts commit 71f43a7c42a37d18be98b0885d62ef76e658f242.

Caused build bot failures:
https://lab.llvm.org/buildbot/#/builders/190/builds/17267
https://lab.llvm.org/buildbot/#/builders/144/builds/21445
https://lab.llvm.org/buildbot/#/builders/46/builds/14343

please consider fixing the test failure in long-array-initialize.ll before
recommitting.
2025-03-28 11:31:56 +00:00
YunQiang Su
71f43a7c42
MIPS: Set EnableLoopTermFold (#133378)
Setting `EnableLoopTermFold` enables `loop-term-fold` pass.
2025-03-28 16:49:45 +08:00
YunQiang Su
6dd14c881c
MIPS: Implements MipsTTIImpl::isLSRCostLess using Insns as first (#133068)
So that LoopStrengthReduce can work for MIPS.
The code is copied from RISC-V.

---------

Co-authored-by: qethu <190734095+qethu@users.noreply.github.com>
2025-03-26 23:01:26 +08:00
Jeremy Morse
792a6f8119
[RemoveDIs] Remove "try-debuginfo-iterators..." test flags (#130298)
These date back to when the non-intrinsic format of variable locations
was still being tested and was behind a compile-time flag, so not all
builds / bots would correctly run them. The solution at the time, to get
at least some test coverage, was to have tests opt-in to non-intrinsic
debug-info if it was built into LLVM.

Nowadays, non-intrinsic format is the default and has been on for more
than a year, there's no need for this flag to exist.

(I've downgraded the flag from "try" to explicitly requesting
non-intrinsic format in some places, so that we can deal with tests that
are explicitly about non-intrinsic format in their own commit).
2025-03-14 15:50:49 +00:00
Ricardo Jesus
15fbdc2b96
[AArch64][SVE] Lower unpredicated loads/stores as LDR/STR. (#127837)
Currently, given:
```cpp
svuint8_t foo(uint8_t *x) {
  return svld1(svptrue_b8(), x);
}
```
We generate:
```gas
foo:
  ptrue   p0.b
  ld1b    { z0.b }, p0/z, [x0]
  ret
```
However, on little-endian and with unaligned memory accesses allowed, we
could instead be using LDR as follows:
```gas
foo:
  ldr     z0, [x0]
  ret
```

The second form avoids the predicate dependency.
Likewise for other types and stores.
2025-02-26 13:56:35 +00:00
Sergei Barannikov
ff9c041d96
[MachineScheduler] Fix physreg dependencies of ExitSU (#123541)
Providing the correct operand index allows addPhysRegDataDeps to compute
the correct latency.

Pull Request: https://github.com/llvm/llvm-project/pull/123541
2025-02-01 20:40:50 +03:00
Nikita Popov
29441e4f5f
[IR] Convert from nocapture to captures(none) (#123181)
This PR removes the old `nocapture` attribute, replacing it with the new
`captures` attribute introduced in #116990. This change is
intended to be essentially NFC, replacing existing uses of `nocapture`
with `captures(none)` without adding any new analysis capabilities.
Making use of non-`none` values is left for a followup.

Some notes:
* `nocapture` will be upgraded to `captures(none)` by the bitcode
   reader.
* `nocapture` will also be upgraded by the textual IR reader. This is to
   make it easier to use old IR files and somewhat reduce the test churn in
   this PR.
* Helper APIs like `doesNotCapture()` will check for `captures(none)`.
* MLIR import will convert `captures(none)` into an `llvm.nocapture`
   attribute. The representation in the LLVM IR dialect should be updated
   separately.
2025-01-29 16:56:47 +01:00
Alex MacLean
4583f6d344
[NVPTX] Switch front-ends and tests to ptx_kernel cc (#120806)
the `ptx_kernel` calling convention is a more idiomatic and standard way
of specifying a NVPTX kernel than using the metadata which is not
supposed to change the meaning of the program. Further, checking the
calling convention is significantly faster than traversing the metadata,
improving compile time.

This change updates the clang and mlir frontends as well as the
NVPTXCtorDtorLowering pass to emit kernels using the calling convention.
In addition, this updates all NVPTX unit tests to use the calling
convention as well.
2025-01-07 18:24:50 -08:00
Fangrui Song
9efa7d7af3
Remove -print-lsr-output in favor of --stop-after=loop-reduce
Pull Request: https://github.com/llvm/llvm-project/pull/121305
2024-12-29 18:58:30 -08:00
Lee Wei
abb9f9fa06
[llvm] Remove br i1 undef from some regression tests [NFC] (#117112)
This PR removes tests with `br i1 undef` under
`llvm/tests/Transforms/Loop*, Lower*`.
2024-11-21 08:06:56 +00:00
Nikita Popov
6dc23b7009
[SCEVExpander] Don't try to reuse SCEVUnknown values (#115141)
The expansion of a SCEVUnknown is trivial (it's just the wrapped value).
If we try to reuse an existing value it might be a more complex
expression that simplifies to the SCEVUnknown.

This is inspired by https://github.com/llvm/llvm-project/issues/114879,
because SCEVExpander replacing a constant with a phi node is just silly.
(I don't consider this a fix for that issue though.)
2024-11-11 12:36:29 +01:00
Yingwei Zheng
0b9f1cc024
[SCEV] Disallow simplifying phi(undef, X) to X (#115109)
See the following case:
```
@GlobIntONE = global i32 0, align 4

define ptr @src() {
entry:
  br label %for.body.peel.begin

for.body.peel.begin:                              ; preds = %entry
  br label %for.body.peel

for.body.peel:                                    ; preds = %for.body.peel.begin
  br i1 true, label %cleanup.peel, label %cleanup.loopexit.peel

cleanup.loopexit.peel:                            ; preds = %for.body.peel
  br label %cleanup.peel

cleanup.peel:                                     ; preds = %cleanup.loopexit.peel, %for.body.peel
  %retval.2.peel = phi ptr [ undef, %for.body.peel ], [ @GlobIntONE, %cleanup.loopexit.peel ]
  br i1 true, label %for.body.peel.next, label %cleanup7

for.body.peel.next:                               ; preds = %cleanup.peel
  br label %for.body.peel.next1

for.body.peel.next1:                              ; preds = %for.body.peel.next
  br label %entry.peel.newph

entry.peel.newph:                                 ; preds = %for.body.peel.next1
  br label %for.body

for.body:                                         ; preds = %cleanup, %entry.peel.newph
  %retval.0 = phi ptr [ %retval.2.peel, %entry.peel.newph ], [ %retval.2, %cleanup ]
  br i1 false, label %cleanup, label %cleanup.loopexit

cleanup.loopexit:                                 ; preds = %for.body
  br label %cleanup

cleanup:                                          ; preds = %cleanup.loopexit, %for.body
  %retval.2 = phi ptr [ %retval.0, %for.body ], [ @GlobIntONE, %cleanup.loopexit ]
  br i1 false, label %for.body, label %cleanup7.loopexit

cleanup7.loopexit:                                ; preds = %cleanup
  %retval.2.lcssa.ph = phi ptr [ %retval.2, %cleanup ]
  br label %cleanup7

cleanup7:                                         ; preds = %cleanup7.loopexit, %cleanup.peel
  %retval.2.lcssa = phi ptr [ %retval.2.peel, %cleanup.peel ], [ %retval.2.lcssa.ph, %cleanup7.loopexit ]
  ret ptr %retval.2.lcssa
}

define ptr @tgt() {
entry:
  br label %for.body.peel.begin

for.body.peel.begin:                              ; preds = %entry
  br label %for.body.peel

for.body.peel:                                    ; preds = %for.body.peel.begin
  br i1 true, label %cleanup.peel, label %cleanup.loopexit.peel

cleanup.loopexit.peel:                            ; preds = %for.body.peel
  br label %cleanup.peel

cleanup.peel:                                     ; preds = %cleanup.loopexit.peel, %for.body.peel
  %retval.2.peel = phi ptr [ undef, %for.body.peel ], [ @GlobIntONE, %cleanup.loopexit.peel ]
  br i1 true, label %for.body.peel.next, label %cleanup7

for.body.peel.next:                               ; preds = %cleanup.peel
  br label %for.body.peel.next1

for.body.peel.next1:                              ; preds = %for.body.peel.next
  br label %entry.peel.newph

entry.peel.newph:                                 ; preds = %for.body.peel.next1
  br label %for.body

for.body:                                         ; preds = %cleanup, %entry.peel.newph
  br i1 false, label %cleanup, label %cleanup.loopexit

cleanup.loopexit:                                 ; preds = %for.body
  br label %cleanup

cleanup:                                          ; preds = %cleanup.loopexit, %for.body
  br i1 false, label %for.body, label %cleanup7.loopexit

cleanup7.loopexit:                                ; preds = %cleanup
  %retval.2.lcssa.ph = phi ptr [ %retval.2.peel, %cleanup ]
  br label %cleanup7

cleanup7:                                         ; preds = %cleanup7.loopexit, %cleanup.peel
  %retval.2.lcssa = phi ptr [ %retval.2.peel, %cleanup.peel ], [ %retval.2.lcssa.ph, %cleanup7.loopexit ]
  ret ptr %retval.2.lcssa
}
```
1. `simplifyInstruction(%retval.2.peel)` returns `@GlobIntONE`. Thus,
`ScalarEvolution::createNodeForPHI` returns SCEV expr `@GlobIntONE` for
`%retval.2.peel`.
2. `SimplifyIndvar::replaceIVUserWithLoopInvariant` tries to replace the
use of `%retval.2.peel` in `%retval.2.lcssa.ph` with `@GlobIntONE`.
3. `simplifyLoopAfterUnroll -> simplifyLoopIVs -> SCEVExpander::expand`
reuses `%retval.2.peel = phi ptr [ undef, %for.body.peel ], [
@GlobIntONE, %cleanup.loopexit.peel ]` to generate code for
`@GlobIntONE`. It is incorrect.

This patch disallows simplifying `phi(undef, X)` to `X` by setting
`CanUseUndef` to false.
Closes https://github.com/llvm/llvm-project/issues/114879.
2024-11-07 15:53:51 +08:00
Youngsuk Kim
caa32e6d6f
[llvm][LSR] Fix where invariant on ScaledReg & Scale is violated (#112576)
Comments attached to the `ScaledReg` field of `struct Formula` explains
that, `ScaledReg` must be non-null when `Scale` is non-zero.

This fixes up a code path where this invariant is violated. Also, add an
assert to ensure this invariant holds true.

Without this patch, compiler aborts with the attached test case.

Fixes #76504
2024-10-17 10:47:44 -04:00
Orlando Cazalet-Hyams
7506872afc
[DebugInfo][LSR] Fix assertion failure salvaging IV with offset > 64 bits wide (#110979)
Fixes #110494
2024-10-03 11:47:08 +01:00
Jeremy Morse
e6bf48d110
[X86] Don't request 0x90 nop filling in p2align directives (#110134)
As of rev ea222be0d, LLVMs assembler will actually try to honour the
"fill value" part of p2align directives. X86 printed these as 0x90, which
isn't actually what it wanted: we want multi-byte nops for .text
padding. Compiling via a textual assembly file produces single-byte
nop padding since ea222be0d but the built-in assembler will produce
multi-byte nops. This divergent behaviour is undesirable.

To fix: don't set the byte padding field for x86, which allows the
assembler to pick multi-byte nops. Test that we get the same multi-byte
padding when compiled via textual assembly or directly to object file.
Added same-align-bytes-with-llasm-llobj.ll to that effect, updated
numerous other tests to not contain check-lines for the explicit padding.
2024-10-02 11:14:05 +01:00
Nikita Popov
9f3d1695eb
[SCEVExpander] Preserve gep nuw during expansion (#102133)
When expanding SCEV adds to geps, transfer the nuw flag to the resulting
gep. (Note that this doesn't apply to IV increment GEPs, which go
through a different code path.)
2024-10-02 11:45:00 +02:00
Nikita Popov
4b3ba64ba7
[SCEVExpander] Clear flags when reusing GEP (#109293)
As pointed out in the review of #102133, SCEVExpander currently
incorrectly reuses GEP instructions that have poison-generating flags
set. Fix this by clearing the flags on the reused instruction.
2024-10-01 14:22:54 +02:00
Nikita Popov
c4744e49f6 [LSR] Regenerate test checks (NFC) 2024-09-19 16:34:10 +02:00
Sergey Kachkov
1f2a634c44 Reland "[LSR] Do not create duplicated PHI nodes while preserving LCSSA form" (#107380)
Motivating example: https://godbolt.org/z/eb97zrxhx
Here we have 2 induction variables in the loop: one is corresponding to
i variable (add rdx, 4), the other - to res (add rax, 2). The second
induction variable can be removed by rewriteLoopExitValues() method
(final value of res at loop exit is unroll_iter * -2); however, this
doesn't happen because we have duplicated LCSSA phi nodes at loop exit:
```
; Preheader:
for.body.preheader.new:                           ; preds = %for.body.preheader
  %unroll_iter = and i64 %N, -4
  br label %for.body

; Loop:
for.body:                                         ; preds = %for.body, %for.body.preheader.new
  %lsr.iv = phi i64 [ %lsr.iv.next, %for.body ], [ 0, %for.body.preheader.new ]
  %i.07 = phi i64 [ 0, %for.body.preheader.new ], [ %inc.3, %for.body ]
  %inc.3 = add nuw i64 %i.07, 4
  %lsr.iv.next = add nsw i64 %lsr.iv, -2
  %niter.ncmp.3.not = icmp eq i64 %unroll_iter, %inc.3
  br i1 %niter.ncmp.3.not, label %for.end.loopexit.unr-lcssa.loopexit, label %for.body, !llvm.loop !7

; Exit blocks
for.end.loopexit.unr-lcssa.loopexit:              ; preds = %for.body
  %inc.3.lcssa = phi i64 [ %inc.3, %for.body ]
  %lsr.iv.next.lcssa11 = phi i64 [ %lsr.iv.next, %for.body ]
  %lsr.iv.next.lcssa = phi i64 [ %lsr.iv.next, %for.body ]
  br label %for.end.loopexit.unr-lcssa
```
rewriteLoopExitValues requires %lsr.iv.next value to have only 2 uses:
one in LCSSA phi node, the other - in induction phi node. Here we have 3
uses of this value because of duplicated lcssa nodes, so the transform
doesn't apply and leads to an extra add operation inside the loop. The
proposed solution is to accumulate inserted instructions that will
require LCSSA form update into SetVector and then call
formLCSSAForInstructions for this SetVector once, so the same
instructions don't process twice.

Reland fixes the issue with preserve-lcssa.ll test: it fails in the situation
when x86_64-unknown-linux-gnu target is unavailable in opt. The changes are
moved into separate duplicated-phis.ll test with explicit x86 target requirement
to fix bots which are not building this target.
2024-09-09 16:14:51 +03:00
Sergey Kachkov
17f0c5dfaa [LSR][NFC] Add pre-commit test 2024-09-09 16:14:50 +03:00