616 Commits

Author SHA1 Message Date
Nikita Popov
beba307c5b [LSR] Clear SCEVExpander before deleting phi nodes
Fixes https://github.com/llvm/llvm-project/issues/84709.
2024-03-12 16:24:10 +01:00
Nikita Popov
6409c21857 [SCEVExpander] Use PoisoningVH for OrigFlags
It's common to delete some instructions after using SCEVExpander,
while it is still live (but will not be used afterwards). In that
case, the AssertingVH may trigger. Replace it with a PoisoningVH
so that we only detect the case where the SCEVExpander actually is
used in a problematic fashion after the instruction removal.

The alternative would be to add clear() calls to more code paths.

Fixes https://github.com/llvm/llvm-project/issues/83404.
2024-03-05 16:41:52 +01:00
Patrick O'Neill
8d6e867eb2
[LSR][term-fold] Ensure the simple recurrence is from the current loop (#83085)
If the phi node found by matchSimpleRecurrence is not from the current
loop, then isAlmostDeadIV panics. With this patch we bail out early.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

---------

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
2024-03-04 16:40:40 -08:00
Nikita Popov
72763521c3 [LSR] Clear SCEVExpander before calling DeleteDeadPHIs
To avoid an assertion failure when an AssertingVH is removed,
as reported in:
https://github.com/llvm/llvm-project/pull/82362#issuecomment-1960067147

Also remove an unnecessary use of SCEVExpanderCleaner.
2024-02-22 22:50:26 +01:00
Nikita Popov
365bf43b8b [LSR] Regenerate test checks (NFC)
Check output IR instead of debug output. The debug output will
change in an upcoming patch in an irrecoverable way.
2024-02-07 11:14:49 +01:00
Nikita Popov
2d69827c5c [Transforms] Convert tests to opaque pointers (NFC) 2024-02-05 11:57:34 +01:00
Philip Reames
28865da374
[LSR][term-fold] Adjust expansion budget based on trip count (#80304)
Follow up to https://github.com/llvm/llvm-project/pull/74747

This change extends the previously added fixed expansion threshold by
scaling down the cost allowed for an expansion for a loop with either a
small known trip count or a profile which indicates the trip count is
likely small. The goal here is to improve code generation for a loop
nest where the outer loop has a high trip count, and the inner loop runs
only a handful of iterations.

---------

Co-authored-by: Nikita Popov <github@npopov.com>
2024-02-02 08:57:26 -08:00
Philip Reames
390d66b03b [LSR] Add tests for restricting term-fold budget based on exact trip count 2024-02-01 07:47:04 -08:00
Philip Reames
f264da4322
[lsr][term-fold] Restrict transform to low cost expansions (#74747)
This is a follow up to an item I noted in my submission comment for
e947f95. I don't have a real world example where this is triggering
unprofitably, but avoiding the transform when we estimate the loop to be
short running from profiling seems quite reasonable. It's also now come
up as a possibility in a regression twice in two days, so I'd like to
get this in to close out the possibility if nothing else.

The original review dropped the threshold for short trip count loops. I
will return to that in a separate review if this lands.
2024-01-31 14:48:20 -08:00
Philip Reames
5282202db0 [LSR] Add a test case mentioned in review
As mentioned in https://github.com/llvm/llvm-project/pull/74747, this case is triggering a particularly high cost trip count expansion.
2024-01-31 12:32:50 -08:00
paperchalice
e390c229a4
[Pass] Add hyphen to some pass names (#74287)
Here is the list of the renamed passes:
- `callbrprepare` -> `callbr-prepare`
- `dwarfehprepare` -> `dwarf-eh-prepare`
- `flattencfg` -> `flatten-cfg`
- `loweratomic` -> `lower-atomic`
- `lowerinvoke` -> `lower-invoke`
- `lowerswitch` -> `lower-switch`
- `winehprepare` -> `win-eh-prepare`
- `targetir` -> `target-ir`
- `targetlibinfo` -> `target-lib-info`

Legacy passes are not affected.
2024-01-25 16:05:54 +08:00
Stephen Tozer
7c53e9f667
[RemoveDIs][DebugInfo] Add support for DPValues to LoopStrengthReduce (#78706)
This patch trivially extends support for DbgValueInst recovery to
DPValues in LoopStrengthReduce; they are handled identically, so this is
mostly done by reusing the DbgValueInst code (using templates or
auto-parameter lambdas to reduce actual code duplication).
2024-01-22 18:59:19 +00:00
Fangrui Song
9e9907f1cf
[AMDGPU,test] Change llc -march= to -mtriple= (#75982)
Similar to 806761a7629df268c8aed49657aeccffa6bca449.

For IR files without a target triple, -mtriple= specifies the full
target triple while -march= merely sets the architecture part of the
default target triple, leaving a target triple which may not make sense,
e.g. amdgpu-apple-darwin.

Therefore, -march= is error-prone and not recommended for tests without
a target triple. The issue has been benign as we recognize
$unknown-apple-darwin as ELF instead of rejecting it outrightly.

This patch changes AMDGPU tests to not rely on the default
OS/environment components. Tests that need fixes are not changed:

```
  LLVM :: CodeGen/AMDGPU/fabs.f64.ll
  LLVM :: CodeGen/AMDGPU/fabs.ll
  LLVM :: CodeGen/AMDGPU/floor.ll
  LLVM :: CodeGen/AMDGPU/fneg-fabs.f64.ll
  LLVM :: CodeGen/AMDGPU/fneg-fabs.ll
  LLVM :: CodeGen/AMDGPU/r600-infinite-loop-bug-while-reorganizing-vector.ll
  LLVM :: CodeGen/AMDGPU/schedule-if-2.ll
```
2024-01-16 21:54:58 -08:00
Florian Hahn
ab33c0b96e
[LSR] Add test showing incorrectly adding nuw with #77827.
Extra test for https://github.com/llvm/llvm-project/pull/77827, where
NUW gets added the AddRec due to the BTC being 0.
2024-01-15 21:31:28 +00:00
Philip Reames
e4d01bb227
[SCEV] Special case sext in isKnownNonZero (#77834)
The existing logic in isKnownNonZero relies on unsigned ranges, which
can be problematic when our range calculation is imprecise. Consider the
following:
  %offset.nonzero = or i32 %offset, 1
  -->  %offset.nonzero U: [1,0) S: [1,0)
  %offset.i64 = sext i32 %offset.nonzero to i64
  -->  (sext i32 %offset.nonzero to i64) U: [-2147483648,2147483648)
                                         S: [-2147483648,2147483648)

Note that the unsigned range for the sext does contain zero in this case
despite the fact that it can never actually be zero.

Instead, we can push the query down one level - relying on the fact that
the sext is an invertible operation and that the result can only be zero
if the input is. We could likely generalize this reasoning for other
invertible operations, but special casing sext seems worthwhile.
2024-01-12 07:45:28 -08:00
Philip Reames
5ce067d592 Revert "[LSR][TTI][RISCV] Disable terminator folding for RISC-V."
This reverts commit fdb87640ee2be63af9b0e0cd943cb13d79686a03, and thus
re-enables terminator folding for RISCV.  The reported miscompile has
been fixed in f5dd70c58277d925710e5a7c25c86d7565cc3c6c.
2024-01-11 13:20:02 -08:00
Philip Reames
f5dd70c582
[LSR] Require non-zero step when considering wrap around for term folding (#77809)
The term folding logic needs to prove that the induction variable does
not cycle through the same set of values so that testing for the value
of the IV on the exiting iteration is guaranteed to trigger only on that
iteration. The prior code checked the no-self-wrap property on the IV,
but this is insufficient as a zero step is trivially no-self-wrap per
SCEV's definition but does repeat the same series of values.

In the current form, this has the effect of basically disabling lsr's
term-folding for all non-constant strides. This is still a net
improvement as we've disabled term-folding entirely, so being able to
enable it for constant strides is still a net improvement.

As future work, there's two SCEV weakness worth investigating.

First sext (or i32 %a, 1) to i64 does not return true for
isKnownNonZero. This is because we check only the unsigned range in that
query. We could either do query pushdown, or check the signed range as
well. I tried the second locally and it has very broad impact - i.e. we
have a bunch of missing optimizations here.

Second, zext (or i32 %a, 1) to i64 as the increment to the IV in
expensive_expand_short_tc causes the addrec to no longer be provably
no-self-wrap. I didn't investigate this so it might be necessary, but
the loop structure is such that I find this result surprising.
2024-01-11 10:07:17 -08:00
Craig Topper
fdb87640ee [LSR][TTI][RISCV] Disable terminator folding for RISC-V.
This is a partial revert of e947f953370abe8ffc8713b8f3250a3ec39599fe.

It caused a miscompile in downstream testing.

Spoke with Philip offline. We believe the issue is that LSR needs to
make sure the Step of the other AddRec is non-zero. Reverting until
Philip is back from vacation.
2023-12-27 15:13:32 -08:00
Jessica Del
32f9983c06
[AMDGPU] - Add address space for strided buffers (#74471)
This is an experimental address space for strided buffers. These buffers
can have structs as elements and
a stride > 1.
These pointers allow the indexed access in units of stride, i.e., they
point at `buffer[index * stride]`.
Thus, we can use the `idxen` modifier for buffer loads.

We assign address space 9 to 192-bit buffer pointers which contain a
128-bit descriptor, a 32-bit offset and a 32-bit index. Essentially,
they are fat buffer pointers with an additional 32-bit index.
2023-12-15 15:49:25 +01:00
Philip Reames
04cbfcc33a [test][lsr] Add term-folding test cases with estimated trip counts 2023-12-07 10:34:29 -08:00
Nikita Popov
eecb99c5f6 [Tests] Add disjoint flag to some tests (NFC)
These tests rely on SCEV looking recognizing an "or" with no common
bits as an "add". Add the disjoint flag to relevant or instructions
in preparation for switching SCEV to use the flag instead of the
ValueTracking query. The IR with disjoint flag matches what
InstCombine would produce.
2023-12-05 14:09:36 +01:00
Philip Reames
e947f95337 [LSR][TTI][RISCV] Enable terminator folding for RISC-V
If looking for a miscompile revert candidate, look here!

The transform being enabled prefers comparing to a loop invariant
exit value for a secondary IV over using an otherwise dead primary
IV.  This increases register pressure (by requiring the exit value
to be live through the loop), but reduces the number of instructions
within the loop by one.

On RISC-V which has a large number of scalar registers, this is
generally a profitable transform.  We loose the ability to use a beqz
on what is typically a count down IV, and pay the cost of computing
the exit value on the secondary IV in the loop preheader, but save
an add or sub in the loop body.  For anything except an extremely
short running loop, or one with extreme register pressure, this is
profitable.  On spec2017, we see a 0.42% geomean improvement in
dynamic icount, with no individual workload regressing by more than
0.25%.

Code size wise, we trade a (possibly compressible) beqz and a (possibly
compressible) addi for a uncompressible beq.  We also add instructions
in the preheader.  Net result is a slight regression overall, but
neutral or better inside the loop.

Previous versions of this transform had numerous cornercase correctness
bugs.  All of them ones I can spot by inspection have been fixed, and I
have run this through all of spec2017, but there may be further issues
lurking.  Adding uses to an IV is a fraught thing to do given poison
semantics, so this transform is somewhat inherently risky.

This patch is a reworked version of D134893 by @eop.  That patch has
been abandoned since May, so I picked it up, reworked it a bit, and
am landing it.
2023-11-29 12:04:06 -08:00
Daniil Suchkov
1344b65c90
[SCEV] Fix incorrect NUW inference (#70521)
This patch fixes a miscompile in LSR caused by incorrect inference of
NUW flag for AddRec: we shouldn't infer no-wrap flags based on a
comparison which doesn't fully control the loop exit.
2023-10-31 11:43:57 -07:00
Danila Malyutin
ba1349fc31
[SCEV] Fix "quick and dirty" difference that could lead to assert (#70688)
The old algorithm would remove all operands matching %step SCEV when it
intended to only remove a single one. This lead to assert when
SCEVAddExpr was of the form %step + %step and potential miscompiles in
similar cases. Such SCEVs could be created when construction reached
depth thresholds.

Fixes #70348
2023-10-31 00:50:57 +03:00
Daniil Suchkov
505e32302c [Test] NFC. Add missing "REQUIRES: x86-registered-target" to LoopStrengthReduce/scev-incorrect-nuw-inference.ll 2023-10-27 20:48:54 +00:00
Daniil Suchkov
33330966e5 [Test] NFC. Add a test exposing a SCEV bug causing an LSR miscompile 2023-10-27 20:16:44 +00:00
Alex Richardson
e39f6c1844 [opt] Infer DataLayout from triple if not specified
There are many tests that specify a target triple/CPU flags but no
DataLayout which can lead to IR being generated that has unusual
behaviour. This commit attempts to use the default DataLayout based
on the relevant flags if there is no explicit override on the command
line or in the IR file.

One thing that is not currently possible to differentiate from a missing
datalayout `target datalayout = ""` in the IR file since the current
APIs don't allow detecting this case. If it is considered useful to
support this case (instead of passing "-data-layout=" on the command
line), I can change IR parsers to track whether they have seen such a
directive and change the callback type.

Differential Revision: https://reviews.llvm.org/D141060
2023-10-26 12:07:37 -07:00
Jay Foad
7b3bbd83c0 Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038)"
This reverts commit 2501ae58e3bb9a70d279a56d7b3a0ed70a8a852c.

Reverted due to various buildbot failures.
2023-10-09 12:31:32 +01:00
Jay Foad
2501ae58e3
[CodeGen] Really renumber slot indexes before register allocation (#67038)
PR #66334 tried to renumber slot indexes before register allocation, but
the numbering was still affected by list entries for instructions which
had been erased. Fix this to make the register allocator's live range
length heuristics even less dependent on the history of how instructions
have been added to and removed from SlotIndexes's maps.
2023-10-09 11:44:41 +01:00
Nikita Popov
1c9b63f103 [LSR] Regenerate test checks (NFC) 2023-09-22 12:40:37 +02:00
Nikita Popov
4de93db447 [LSR] Regenerate test checks (NFC)
While there also remove some UB from the test.
2023-09-21 16:34:44 +02:00
Vedant Paranjape
5a9a02f67b [SCEV] Compute SCEV for ashr(add(shl(x, n), c), m) instr triplet
%x = shl i64 %w, n
%y = add i64 %x, c
%z = ashr i64 %y, m

The above given instruction triplet is seen many times in the generated
LLVM IR, but SCEV model is not able to compute the SCEV value of AShr
instruction in this case.

This patch models the two cases of the above instruction pattern using
the following expression:

=> sext(add(mul(trunc(w), 2^(n-m)), c >> m))

1) when n = m the expression reduces to sext(add(trunc(w), c >> n))
as n-m=0, and multiplying with 2^0 gives the same result.

2) when n > m the expression works as given above.

It also adds several unittest to verify that SCEV is able to compute
the value.

$ opt sext-add-inreg.ll -passes="print<scalar-evolution>"

Comparing the snippets of the result of SCEV analysis:

* SCEV of ashr before change
----------------------------
%idxprom = ashr exact i64 %sext, 32
  -->  %idxprom U: [-2147483648,2147483648) S: [-2147483648,2147483648)
  Exits: 8                LoopDispositions: { %for.body: Variant }

* SCEV of ashr after change
---------------------------
%idxprom = ashr exact i64 %sext, 32
  -->  {0,+,1}<nuw><nsw><%for.body> U: [0,9) S: [0,9)
  Exits: 8                LoopDispositions: { %for.body: Computable }

LoopDisposition of the given SCEV was LoopVariant before, after adding
the new way to model the instruction, the LoopDisposition becomes
LoopComputable as it is able to compute the SCEV of the instruction.

Differential Revision: https://reviews.llvm.org/D152278
2023-08-25 05:42:08 +00:00
Harvin Iriawan
db158c7c83 [AArch64] Update generic sched model to A510
Refresh of the generic scheduling model to use A510 instead of A55.
  Main benefits are to the little core, and introducing SVE scheduling information.
  Changes tested on various OoO cores, no performance degradation is seen.

  Differential Revision: https://reviews.llvm.org/D156799
2023-08-21 12:25:15 +01:00
Florian Hahn
3ba3ea3c06
[IVUsers] Check getExpr result in findAddRecForLoop.
This fixes a crash if the SCEV for the use isn't invertible and nullptr
is returned.

Fixes https://github.com/llvm/llvm-project/issues/63840
2023-07-20 14:56:19 +01:00
Nikita Popov
ddb46abd3c [LSR] Don't consider users of constant outside loop
In CollectLoopInvariantFixupsAndFormulae(), LSR looks at users
outside the loop. E.g. if we have an addrec based on %base, and
%base is also used outside the loop, then we have to keep it in a
register anyway, which may make it more profitable to use
%base + %idx style addressing.

This reasoning doesn't hold up when the base is a constant, because
the constant can be rematerialized. The lsr-memcpy.ll test regressed
when enabling opaque pointers, because inttoptr (i64 6442450944 to ptr)
now also has a use outside the loop (previously it didn't due to a
pointer type difference), and that extra "use" results in worse use
of addressing modes in the loop. However, the use outside the loop
actually gets rematerialized, so the alleged register saving does
not occur.

The same reasoning also applies to other types of constants, such
as global variable references.

Differential Revision: https://reviews.llvm.org/D155073
2023-07-13 12:22:38 +02:00
Nikita Popov
e8a5df7beb [LSR] Add test variant with global variables (NFC)
A variant of the test using globals instead of inttoptr expressions
for D155073.
2023-07-13 12:12:48 +02:00
Nikita Popov
edb2fc6dab [llvm] Remove explicit -opaque-pointers flag from tests (NFC)
Opaque pointers mode is enabled by default, no need to explicitly
enable it.
2023-07-12 14:35:55 +02:00
Nikita Popov
6c388e06f5 [LSR] Convert test to opaque pointers (NFC)
This regresses with opaque pointers. I'll submit a patch to recover
the regression.
2023-07-12 14:07:25 +02:00
Nikita Popov
4ec3ea8afa [LSR] Convert some tests to opaque pointers (NFC)
These no longer show codegen regressions.
2023-07-12 11:48:44 +02:00
Nikita Popov
bd0710c221 [LSR] Move test to target specific directory (NFC)
Uses an x86 triple.
2023-07-12 11:44:09 +02:00
Nikita Popov
d69033d245 [SCEVExpander] Fix GEP IV inc reuse logic for opaque pointers
Instead of checking the pointer type, check the element type of
the GEP.

Previously we ended up reusing GEP increments that were not in
expanded form, thus not respecting LSRs choice of representation.

The change in 2011-10-06-ReusePhi.ll recovers a regression that
appeared when converting that test to opaque pointers.

Changes in various Thumb tests now compute the step outside the
loop instead of using add.w inside the loop, which is LSR's
preferred representation for this target.
2023-07-12 11:32:13 +02:00
Nikita Popov
7a21efce72 [LSR] Move test to target-specific directory (NFC) 2023-07-12 10:10:49 +02:00
Nikita Popov
cfa9275888 [LSR] Convert some tests to opaque pointers (NFC) 2023-07-12 09:46:08 +02:00
Nikita Popov
7a78756118 [LSR] Regenerate test checks (NFC) 2023-07-12 09:40:10 +02:00
Florian Hahn
69ca5c9d62
[SCEV] Add flag to control invertible check for normalization.
When normalizing a SCEV expression during expansion, there should be
no need for it to be invertible, as it will only be used for code
generation. This fixes a crash after 7f5b15ad150e.

Fixes https://github.com/llvm/llvm-project/issues/63678.
2023-07-05 18:11:44 +01:00
Florian Hahn
7f5b15ad15
[LSR] Move normalization check to normalizeForPostIncUse.
Move the logic added in 3a57152d85e1 to normalizeForPostIncUse to catch
additional un-invertable cases. This fixes another mis-compile pointed
out by @peixin in D153004.
2023-07-04 11:56:51 +01:00
Florian Hahn
02591d26b9
[LSR] Add test for another normalization miscompile.
Based on @peixin test case shared in D153004.
2023-07-03 18:57:31 +01:00
Fangrui Song
d39b4ce3ce [test] Replace aarch64-*-eabi with aarch64
Using "eabi" for aarch64 targets is a common mistake and warned by Clang Driver.
We want to avoid it elsewhere as well. Just use the common "aarch64" without
other triple components.
2023-06-27 20:02:52 -07:00
Nikita Popov
b51153792b [LSR] Convert some tests to opaque pointers (NFC) 2023-06-23 17:13:57 +02:00
Nikita Popov
2c9aba9352 [LSR] Regenerate test checks (NFC) 2023-06-23 17:06:51 +02:00