169 Commits

Author SHA1 Message Date
Florian Hahn
a4573ee38d
[LoopUnroll] UnrollRuntimeMultiExit takes precedence over TTI. (#134259)
Update UnrollRuntimeLoopRemainder to always give priority to the
UnrollRuntimeMultiExit option, if provided.

After ad9da92cf6f7357 (https://github.com/llvm/llvm-project/pull/124462),
we would ignore the option if the backend indicates multi-exit is profitable.
This means it cannot be used to disable runtime unrolling.

To be consistent with canProfitablyRuntimeUnrollMultiExitLoop, always
respect the option.

This surfaced while discussing https://github.com/llvm/llvm-project/pull/131998.

PR: https://github.com/llvm/llvm-project/pull/134259
2025-04-04 10:16:50 +01:00
Florian Hahn
ad9da92cf6
[LoopUnroll] Add RuntimeUnrollMultiExit to loop unroll options (NFC) (#124462)
Add an extra knob to RuntimeUnrollMultiExit to let backends control
whether to allow multi-exit unrolling on a per-loop basis.

This gives backends more fine-grained control on deciding if multi-exit
unrolling is profitable for a given loop and uarch. Similar to
4226e0a0c75.

PR: https://github.com/llvm/llvm-project/pull/124462
2025-01-27 21:20:04 +00:00
Florian Hahn
4226e0a0c7
[TTI] Add SCEVExpansionBudget to loop unrolling options. (#118316)
Add an extra know to UnrollingPreferences to let backends control the
maximum budget for SCEV expansions.

This gives backends more fine-grained control on the cost of the runtime
checks for runtime unrolling.

PR: https://github.com/llvm/llvm-project/pull/118316
2024-12-02 21:35:00 +00:00
Kazu Hirata
013f4a46d1
[Utils] Remove unused includes (NFC) (#114748)
Identified with misc-include-cleaner.
2024-11-04 19:51:25 -08:00
Nikita Popov
c8e0672864 [Loops] Use forgetLcssaPhiWithNewPredecessor() in more places
Use the more aggressive invalidation method in a number of places
that add incoming values to lcssa phi nodes. It is likely that
it's possible to construct cases with incorrect SCEV preservation
similar to https://github.com/llvm/llvm-project/issues/109333 for
these.
2024-09-23 09:54:28 +02:00
Kazu Hirata
b7146aed5b
[Transforms] Construct SmallVector with ArrayRef (NFC) (#101851) 2024-08-03 15:33:08 -07:00
Nikita Popov
2d209d964a
[IR] Add getDataLayout() helpers to BasicBlock and Instruction (#96902)
This is a helper to avoid writing `getModule()->getDataLayout()`. I
regularly try to use this method only to remember it doesn't exist...

`getModule()->getDataLayout()` is also a common (the most common?)
reason why code has to include the Module.h header.
2024-06-27 16:38:15 +02:00
Nikita Popov
37c736e035 [LoopUnroll] Use poison instead of undef for another preheader value 2024-06-25 12:17:20 +02:00
Nikita Popov
eeb0884e66 [LoopUnroll] Use poison instead of undef for preheader value 2024-06-25 12:09:58 +02:00
Jay Foad
d4a0154902
[llvm-project] Fix typo "seperate" (#95373) 2024-06-13 20:20:27 +01:00
Sameer Sahasrabuddhe
e0ac087ff0
[LoopUnroll] Consider convergence control tokens when unrolling (#91715)
- There is no restriction on a loop with controlled convergent
operations when
  the relevant tokens are defined and used within the loop.

- When a token defined outside a loop is used inside (also called a loop
convergence heart), unrolling is allowed only in the absence of
remainder or
  runtime checks.

- When a token defined inside a loop is used outside, such a loop is
said to be
"extended". This loop can only be unrolled by also duplicating the
extended part
  lying outside the loop. Such unrolling is disabled for now.

- Clean up loop hearts: When unrolling a loop with a heart, duplicating
the
heart will introduce multiple static uses of a convergence control token
in a
cycle that does not contain its definition. This violates the static
rules for
tokens, and needs to be cleaned up into a single occurrence of the
intrinsic.

- Spell out the initializer for UnrollLoopOptions to improve
readability.


Original implementation [D85605] by Nicolai Haehnle
<nicolai.haehnle@amd.com>.
2024-06-06 13:13:46 +05:30
Harald van Dijk
a6171900a4
[RemoveDIs] Change remapDbgVariableRecord to remapDbgRecord (#91456)
We need to remap any DbgRecord, not just DbgVariableRecords.

This is the followup to #91447.

Co-authored-by: PietroGhg <pietro.ghiglio@codeplay.com>
2024-05-08 17:02:25 +01:00
Stephen Tozer
ffd08c7759
[RemoveDIs][NFC] Rename DPValue -> DbgVariableRecord (#85216)
This is the major rename patch that prior patches have built towards.
The DPValue class is being renamed to DbgVariableRecord, which reflects
the updated terminology for the "final" implementation of the RemoveDI
feature. This is a pure string substitution + clang-format patch. The
only manual component of this patch was determining where to perform
these string substitutions: `DPValue` and `DPV` are almost exclusively
used for DbgRecords, *except* for:

- llvm/lib/target, where 'DP' is used to mean double-precision, and so
appears as part of .td files and in variable names. NB: There is a
single existing use of `DPValue` here that refers to debug info, which
I've manually updated.
- llvm/tools/gold, where 'LDPV' is used as a prefix for symbol
visibility enums.

Outside of these places, I've applied several basic string
substitutions, with the intent that they only affect DbgRecord-related
identifiers; I've checked them as I went through to verify this, with
reasonable confidence that there are no unintended changes that slipped
through the cracks. The substitutions applied are all case-sensitive,
and are applied in the order shown:

```
  DPValue -> DbgVariableRecord
  DPVal -> DbgVarRec
  DPV -> DVR
```

Following the previous rename patches, it should be the case that there
are no instances of any of these strings that are meant to refer to the
general case of DbgRecords, or anything other than the DPValue class.
The idea behind this patch is therefore that pure string substitution is
correct in all cases as long as these assumptions hold.
2024-03-19 20:07:07 +00:00
Stephen Tozer
15f3f446c5
[RemoveDIs][NFC] Rename common interface functions for DPValues->DbgRecords (#84793)
As part of the effort to rename the DbgRecord classes, this patch
renames the widely-used functions that operate on DbgRecords but refer
to DbgValues or DPValues in their names to refer to DbgRecords instead;
all such functions are defined in one of `BasicBlock.h`,
`Instruction.h`, and `DebugProgramInstruction.h`.

This patch explicitly does not change the names of any comments or
variables, except for where they use the exact name of one of the
renamed functions. The reason for this is reviewability; this patch can
be trivially examined to determine that the only changes are direct
string substitutions and any results from clang-format responding to the
changed line lengths. Future patches will cover renaming variables and
comments, and then renaming the classes themselves.
2024-03-12 14:53:13 +00:00
Nikita Popov
62ae7d976f [LoopUnroll] Fix missing sign extension
For integers larger than 64-bit, this would zero-extend a -1
value, instead of sign-extending it.

Fixes https://github.com/llvm/llvm-project/issues/80289.
2024-02-01 16:08:25 +01:00
Jeremy Morse
59fab22642 [DebugInfo][RemoveDIs] Support cloning and remapping DPValues (#72546)
This patch adds support for CloneBasicBlock duplicating the DPValues
attached to instructions, and adds facilities to remap them into their new
context. The plumbing to achieve this is fairly straightforwards and
mechanical.

I've also added illustrative uses to LoopUnrollRuntime, SimpleLoopUnswitch
and SimplifyCFG. The former only updates for the epilogue right now so I've
added CHECK lines just for the end of an unrolled loop (further updates
coming later). SimpleLoopUnswitch had no debug-info tests so I've added a
new one. The two modified parts of SimplifyCFG are covered by the two
modified SimplifyCFG tests.

These are scenarios where we have to do extra cloning for copying of
DPValues because they're no longer instructions, and remap them too.
2023-11-24 15:17:32 +00:00
Matthias Braun
b30c9c9378 LoopUnrollRuntime: Add weights to all branches
Make sure every conditional branch constructed by `LoopUnrollRuntime`
code sets branch weights.

- Add new 1:127 weights for the conditional jumps checking whether the
  whole (unrolled) loop should be skipped in the generated prolog or
  epilog code.
- Remove `updateLatchBranchWeightsForRemainderLoop` function and just
  add weights immediately when constructing the relevant branches. This
  leads to simpler code and makes the code more obvious as every call
  to `CreateCondBr` now has a `BranchWeights` parameter.
- Rework formula for epilogue latch weights, to assume equal
  distribution of remainders and remove `assert` (as I was able to
  reach this code when forcing small unroll factors on the commandline).

Differential Revision: https://reviews.llvm.org/D158642
2023-09-11 14:23:29 -07:00
Jeremy Morse
6942c64e81 [NFC][RemoveDIs] Prefer iterator-insertion over instructions
Continuing the patch series to get rid of debug intrinsics [0], instruction
insertion needs to be done with iterators rather than instruction pointers,
so that we can communicate information in the iterator class. This patch
adds an iterator-taking insertBefore method and converts various call sites
to take iterators. These are all sites where such debug-info needs to be
preserved so that a stage2 clang can be built identically; it's likely that
many more will need to be changed in the future.

At this stage, this is just changing the spelling of a few operations,
which will eventually become signifiant once the debug-info bearing
iterator is used.

[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939

Differential Revision: https://reviews.llvm.org/D152537
2023-09-11 11:48:45 +01:00
Yevgeny Rouban
1ebbbf1614 [LoopUnrollRuntime] Allow indirect transition to deopt non-latch exit blocks
Relax condition on runtime trip count unrolling loops with 1 non-latch exit
that leads to a deop block.

There are cases when the deopt blocks are common exits for different loops.
LoopSimplify pass splits such edges to the common deopting blocks to make
sure that all exit nodes of the loop only have predecessors that are inside
of the loop (See simplifyOneLoop()). This breaks the current condition for
unrolling. This patch allows the split transitive blocks that still lead to
the deopting blocks.

Differential Revision: https://reviews.llvm.org/D152639
2023-06-19 11:10:01 +07:00
Fangrui Song
fb8eb84e5f [Transforms,InstCombine] std::optional::value => operator*/operator->
value() has undesired exception checking semantics and calls
__throw_bad_optional_access in libc++. Moreover, the API is unavailable without
_LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see
_LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS).
2022-12-16 22:57:56 +00:00
Vasileios Porpodas
dc891846b8 [NFC] Cleanup: Replace Function::getBasicBlockList().splice() with Function::splice()
This is part of a series of patches that aim at making Function::getBasicBlockList() private.

Differential Revision: https://reviews.llvm.org/D139984
2022-12-14 15:34:19 -08:00
Fangrui Song
c178ed33bd Transforms/Utils: llvm::Optional => std::optional 2022-12-12 08:29:05 +00:00
Kazu Hirata
0e37ef0186 [Transforms] Fix comment typos (NFC) 2022-08-07 23:55:24 -07:00
Paul Kirth
d434e40f39 [llvm][NFC] Refactor code to use ProfDataUtils
In this patch we replace common code patterns with the use of utility
functions for dealing with profiling metadata. There should be no change
in functionality, as the existing checks should be preserved in all
cases.

Reviewed By: bogner, davidxl

Differential Revision: https://reviews.llvm.org/D128860
2022-08-03 00:09:45 +00:00
Paul Kirth
6e9bab71b6 Revert "[llvm][NFC] Refactor code to use ProfDataUtils"
This reverts commit 300c9a78819b4608b96bb26f9320bea6b8a0c4d0.

We will reland once these issues are ironed out.
2022-07-27 21:38:11 +00:00
Paul Kirth
300c9a7881 [llvm][NFC] Refactor code to use ProfDataUtils
In this patch we replace common code patterns with the use of utility
functions for dealing with profiling metadata. There should be no change
in functionality, as the existing checks should be preserved in all
cases.

Reviewed By: bogner, davidxl

Differential Revision: https://reviews.llvm.org/D128860
2022-07-27 21:13:54 +00:00
Kazu Hirata
611ffcf4e4 [llvm] Use value instead of getValue (NFC) 2022-07-13 23:11:56 -07:00
Florian Hahn
6d5f814357
[LoopUnrollRuntime] Invalidate SCEV for exit phi in ConnectProlog.
ConnectProlog adds new incoming values to exit phi nodes which can
change the SCEV for the phi after 20d798bd47ec51.

Fix is analog to cfc741bc0e029.

Fixes #56286.
2022-06-29 20:28:43 +01:00
Florian Hahn
9a35f19e3e
[UnrollRuntime] Invalidate SCEVs for modified phis in ConnectEpilog.
ConnectEpilog adds new incoming values to exit phi nodes which can
change the SCEV for the phi after 20d798bd47ec51.

Fix is analog to cfc741bc0e029.

Fixes #56282.
2022-06-29 18:26:00 +01:00
Kazu Hirata
a7938c74f1 [llvm] Don't use Optional::hasValue (NFC)
This patch replaces Optional::hasValue with the implicit cast to bool
in conditionals only.
2022-06-25 21:42:52 -07:00
Kazu Hirata
3b7c3a654c Revert "Don't use Optional::hasValue (NFC)"
This reverts commit aa8feeefd3ac6c78ee8f67bf033976fc7d68bc6d.
2022-06-25 11:56:50 -07:00
Kazu Hirata
aa8feeefd3 Don't use Optional::hasValue (NFC) 2022-06-25 11:55:57 -07:00
Simon Moll
b8c2781ff6 [NFC] format InstructionSimplify & lowerCaseFunctionNames
Clang-format InstructionSimplify and convert all "FunctionName"s to
"functionName".  This patch does touch a lot of files but gets done with
the cleanup of InstructionSimplify in one commit.

This is the alternative to the less invasive clang-format only patch: D126783

Reviewed By: spatel, rengolin

Differential Revision: https://reviews.llvm.org/D126889
2022-06-09 16:10:08 +02:00
Nikita Popov
81c648a3d9 [LoopUnroll] Freeze tripcount rather than condition
This is a followup to D125754. We introduce two branches, one
before the unrolled loop and one before the epilogue (and similar
for the prologue case). The previous patch only froze the
condition on the first branch.

Rather than independently freezing the second condition, this patch
instead freezes TripCount and bases BECount on it. These are the
two quantities involved in the conditions, and this ensures that
both work on a consistent, non-poisonous trip count.

Differential Revision: https://reviews.llvm.org/D125896
2022-05-24 09:42:39 +02:00
Nikita Popov
323514de58 [LoopUnroll] Avoid branch on poison for runtime unroll with multiple exits
When performing runtime unrolling with multiple exits, one of the
earlier (non-latch) exits may exit the loop on the first iteration,
such that we never branch on the latch exit condition. As such, we
need to freeze the condition of the new branch that is introduced
before the loop, as it now executes unconditionally.

Differential Revision: https://reviews.llvm.org/D125754
2022-05-18 09:51:22 +02:00
serge-sans-paille
a494ae43be Cleanup includes: TransformsUtils
Estimation on the impact on preprocessor output:
before: 1065307662
after:  1064800684

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D120741
2022-03-01 21:00:07 +01:00
Nikita Popov
84b978da3b [LoopUnrollRuntime] Remove unnecessary pointer BECount check (NFC)
BECounts are guaranteed to be integers nowadays.
2021-12-01 10:32:37 +01:00
Philip Reames
8f95e915cd [unroll-runtime] Relax two profitability limitations on multi-exit unrolling
This change is mostly about getting rid of some "uninteresting" cases in a follow on deeper heuristic change.  If anyone sees actually interesting code differences out of this, please let me know.  I'm not expecting this to have much impact at all.

Case 1 - With the single deoptimize non-latch exit, we can't have two exiting blocks sharing an exit block.  We can only hit this with a poorly documented debug flag.

Case 2 - Why should we treat epilog cases differently from prolog cases?  Or to say it differently, why should starting with a constant control whether a multiple exit loop gets unrolled?

Sorry for the lack of tests here.  These are both *exceedingly* narrow cases in practice, and after a while trying, I couldn't come up with a test which did anything "useful" as opposed to simply exercise a random combination of force flags.  Note that the legality cases for each are already exercised with force flags.
2021-11-15 13:00:14 -08:00
Philip Reames
423da61835 [runtime-unroll] Inline canSafelyUnrollMultiExitLoop [NFC]
All of the interesting logic from this routine has been removed, inline the single check into the sole non-assert caller.  The assert use has little value with the restructured code and is simply dropped.
2021-11-15 11:39:07 -08:00
Philip Reames
e99902a872 [runtime-unroll] Restructure if-clause to improve readability [NFC] 2021-11-15 11:13:27 -08:00
Philip Reames
37ead201e6 [runtime-unroll] Use incrementing IVs instead of decrementing ones
This is one of those wonderful "in theory X doesn't matter, but in practice is does" changes. In this particular case, we shift the IVs inserted by the runtime unroller to clamp iteration count of the loops* from decrementing to incrementing.

Why does this matter?  A couple of reasons:
* SCEV doesn't have a native subtract node.  Instead, all subtracts (A - B) are represented as A + -1 * B and drops any flags invalidated by such.  As a result, SCEV is slightly less good at reasoning about edge cases involving decrementing addrecs than incrementing ones.  (You can see this in the inferred flags in some of the test cases.)
* Other parts of the optimizer produce incrementing IVs, and they're common in idiomatic source language.  We do have support for reversing IVs, but in general if we produce one of each, the pair will persist surprisingly far through the optimizer before being coalesced.  (You can see this looking at nearby phis in the test cases.)

Note that if the hardware prefers decrementing (i.e. zero tested) loops, LSR should convert back immediately before codegen.

* Mostly irrelevant detail: The main loop of the prolog case is handled independently and will simple use the original IV with a changed start value.  We could in theory use this scheme for all iteration clamping, but that's a larger and more invasive change.
2021-11-12 15:44:58 -08:00
Kazu Hirata
81e9c90686 [llvm] Use llvm::is_contained (NFC) 2021-10-14 22:44:09 -07:00
Kazu Hirata
abca4c012f [Utils] Use make_early_inc_range (NFC) 2021-09-13 08:57:23 -07:00
Philip Reames
fa82a3d016 [runtimeunroll] Support epilogue unrolling with a parent loop
This patch adds support for unrolling inner loops using epilogue unrolling. The basic issue is that the original latch exit block of the inner loop could be outside the outer loop.  When we clone the inner loop and split the latch exit, the cloned blocks need to be in the outer loop.

Differential Revision: https://reviews.llvm.org/D108476
2021-09-02 16:29:20 -07:00
Philip Reames
45c672e20d [runtimeunroll] Under EXPENSIVE_CHECKS, validate loop info
Requested in review comment on D108476
2021-09-02 16:28:46 -07:00
Philip Reames
b604fcb7bc [runtime] Move prolog/epilog block to a post-simplify strategy
The runtime unroller will try to produce a non-loop if the unroll count is 2 and thus the prolog/epilog loop would only run at most one iteration. The old implementation did this by avoiding loop construction entirely. This patches instead constructs the trivial loop and then explicitly breaks the backedge and simplifies. This does result in some additional code churn when triggered, but a) results in better quality code and b) removes a codepath which didn't work properly for multiple exit epilogs.

One oddity that I want to draw to reviewer attention is that this somehow changes revisit order. The new order looks equivalent to me, but I don't understand how creating and erasing an extra loop here creates this effect.

Differential Revision: https://reviews.llvm.org/D108521
2021-08-31 09:29:36 -07:00
Philip Reames
d8d84c9df8 [runtimeunroll] Use early return to reduce nesting [nfc] 2021-08-22 11:34:50 -07:00
Philip Reames
17b9cb1817 [runtimeunroll] Support multiple exits to latch exit w/prolog loop
This patch extends the runtime unrolling infrastructure to support unrolling a loop with multiple exiting blocks branching to the same exit block used by the latch. It intentionally does not include a cost model change to enable this functionality unless appropriate force flags are used.

This is the prolog companion to D107381. Since this was LGTMed, a problem with DT updating was reported against that patch.  I roled in the analogous fix here as it seemed obvious, and not worth re-review.

As an aside, our prolog form leaves a lot of potential value on the floor when there is an invariant load or invariant condition in the loop being runtime unrolled. We should probably consider a "required prolog" heuristic.  (Alternatively, maybe we should be peeling these cases more aggressively?)

Differential Revision: https://reviews.llvm.org/D108262
2021-08-19 11:43:52 -07:00
Philip Reames
447256f22b [runtimeunroll] Fix reported DT verification error after 94d0914
In 94d0914, I added support for unrolling of multiple exit loops which have multiple exits reaching the latch.  Per reports on the review post commit, I'd missed updating the domtree for one case.  This fix addresses that ommission.

There's no new test as this is covered by existing tests with expensive verification turned on.
2021-08-19 11:06:17 -07:00
Philip Reames
94d0914292 [runtimeunroll] Support multiple exits to latch exit w/epilogue loop
This patch extends the runtime unrolling infrastructure to support unrolling a loop with multiple exiting blocks branching to the same exit block used by the latch. It intentionally does not include a cost model change to enable this functionality unless appropriate force flags are used.

I decided to restrict this to the epilogue case. Given the changes ended up being pretty generic, we may be able to unblock the prolog case too, but I want to do that in a separate change to reduce the amount of code we all have to understand at one time.

Differential Revision: https://reviews.llvm.org/D107381
2021-08-17 17:52:04 -07:00