156 Commits

Author SHA1 Message Date
Kazu Hirata
0028ef667a
[llvm] Remove unused local variables (NFC) (#167106)
Identified with bugprone-unused-local-non-trivial-variable.
2025-11-08 07:41:07 -08:00
Simon Tatham
f36f2bff84
[ARM][MVE] Invalid tail predication in LowOverheadLoop pass (#163941)
When a loop is converted into a low-overhead loop using tail predication
via FPSCR.LTPSIZE, the MQPRCopy pseudo-instruction is expanded into
either two VMOVD or a single MVE_VORR, depending on whether the values
written to the lanes with a 'false' predicate matter. (MVE_VORR uses the
ambient LTPSIZE predicate, so it won't write those lanes at all; the
double VMOVD is slower but gets them right.)

This check was done based on whether the output of the MQPRCopy is live
coming out of the loop. But it missed a case where the live-out value is
not _itself_ an MQPRCopy, but is a predicated operation taking its false
lanes from an MQPRCopy.

Fixes #162644, and adds a new MIR test case derived from the reproducer
in that bug.
2025-10-22 09:09:40 +01:00
Simon Tatham
8da0df4956
[TableGen] List the indices of sub-operands (#163723)
Some instances of the `Operand` class used in Tablegen instruction
definitions expand to a cluster of multiple operands at the MC layer,
such as complex addressing modes involving base + offset + shift, or
clusters of operands describing conditional Arm instructions or
predicated MVE instructions. There's currently no convenient way for C++
code to know the offset of one of those sub-operands from the start of
the cluster: instead it just hard-codes magic numbers like `index+2`,
which is hard to read and fragile.

This patch adds an extra piece of output to `InstrInfoEmitter` to define
those instruction offsets, based on the name of the `Operand` class
instance in Tablegen, and the names assigned to the sub-operands in the
`MIOperandInfo` field. For example, if target Foo were to define

  def Bar : Operand {
    let MIOperandInfo = (ops GPR:$first, i32imm:$second);
    // ...
  }

then the new constants would be `Foo::SUBOP_Bar_first` and
`Foo::SUBOP_Bar_second`, defined as 0 and 1 respectively.

As an example, I've converted some magic numbers related to the MVE
predication operand types (`vpred_n` and its superset `vpred_r`) to use
the new named constants in place of the integer literals they previously
used. This is more verbose, but also clearer, because it explains why
the integer is chosen instead of what its value is.
2025-10-21 09:29:18 +01:00
Mikhail Gudim
562146499c
[CodeGen][NewPM] Port ReachingDefAnalysis to new pass manager. (#159572)
In this commit:
  (1) Added new pass manager support for `ReachingDefAnalysis`.
  (2) Added printer pass.
  (3) Make old pass manager use `ReachingDefInfoWrapperPass`
2025-09-19 09:38:34 -04:00
Kazu Hirata
94fc76a360
[ARM] Remove unnecessary casts (NFC) (#148391)
getRegisterInfo() already returns const ARMBaseRegisterInfo *.

Likewise, getInstrInfo() already returns const ARMBaseInstrInfo *.
2025-07-12 15:46:04 -07:00
Rahul Joshi
52c2e45c11
[NFC][CodeGen] Adopt MachineFunctionProperties convenience accessors (#141101) 2025-05-23 08:30:29 -07:00
Kazu Hirata
41b76119ec
[llvm] Use range constructors for *Set (NFC) (#132636) 2025-03-23 15:50:34 -07:00
Kazu Hirata
71935281e0
[Target] Use *Set::insert_range (NFC) (#132140)
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range.  This patch replaces:

  Dest.insert(Src.begin(), Src.end());

with:

  Dest.insert_range(Src);

This patch does not touch custom begin like succ_begin for now.
2025-03-20 09:09:30 -07:00
Kazu Hirata
9571cc2b28
[ARM] Remove unused includes (NFC) (#115995)
Identified with misc-include-cleaner.
2024-11-12 23:15:21 -08:00
Kazu Hirata
cee65092c9
[ARM] Avoid repeated hash lookups (NFC) (#109602) 2024-09-23 06:42:06 -07:00
Kazu Hirata
d84411f686 [ARM] clang-format ARMLowOverheadLoops.cpp (NFC)
I'm going to touch this area in a subsequent patch.
2024-09-22 08:31:20 -07:00
paperchalice
79d0de2ac3
[CodeGen][NewPM] Port machine-loops to new pass manager (#97793)
- Add `MachineLoopAnalysis`.
- Add `MachineLoopPrinterPass`.
- Convert to `MachineLoopInfoWrapperPass` in legacy pass manager.
2024-07-09 09:11:18 +08:00
Kazu Hirata
ddaa93b095
[llvm] Use std::make_unique (NFC) (#97165)
This patch is based on clang-tidy's modernize-make-unique but limited
to those cases where type names are mentioned twice like
std::unique_ptr<Type>(new Type()), which is a bit mouthful.
2024-06-29 11:50:41 -07:00
David Green
5fe7307146
[ARM] Don't block tail-predication from unrelated VPT blocks. (#94239)
VPT blocks that do not produce an interesting 'output' (like a stored
value or reduction result), do not need to be predicated on vctp for the
whole loop to be tail-predicated. Just producing results for the valid
tail predication lanes should be enough.
2024-06-06 11:43:39 +01:00
David Green
850f30c3ba [ARM][MVE] Don't allow tail-predication with else predicates
The test case contains a vpt block with an else predicated instruction. This
might not be very unrealistic, but currently crashes due to not being able to
handle the else. The instruction would need to be removed. This patch adds some
extra checks that none of the instructions in vpt block is else predicated,
leaving it using vctp.
2024-05-29 09:08:32 +01:00
David Green
af36fb00e3 [ARM] Remove static variables from ARMLowOverheadLoops. NFC
VPTState was holding static state, and acting as both the info in a VPTBlock
and the overall state of all the blocks in the loop. This has been split up
into a class (VPTBlock) to hold the instructions of one block, and VPTState
that holds the overall state. The PredicatedInsts is also made into a
map<MachineInstr *, SetVector<MachineInstr *>>, as the double-storing of MI
inside a unique pointer is unneeded.
2024-05-28 07:49:34 +01:00
Xu Zhang
f6d431f208
[CodeGen] Make the parameter TRI required in some functions. (#85968)
Fixes #82659

There are some functions, such as `findRegisterDefOperandIdx` and  `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI  parameters, as shown in issue #82411.

Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`,  `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact.

After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.
2024-04-24 14:24:14 +01:00
Kai Nacke
21d177096f
[NFC] Refactor looping over recomputeLiveIns into function (#88040)
https://github.com/llvm/llvm-project/pull/79940 put calls to
recomputeLiveIns into
a loop, to repeatedly call the function until the computation converges.
However,
this repeats a lot of code. This changes moves the loop into a function
to simplify
the handling.

Note that this changes the order in which recomputeLiveIns is called.
For example,

```
  bool anyChange = false;
  do {
    anyChange = recomputeLiveIns(*ExitMBB) || recomputeLiveIns(*LoopMBB);
  } while (anyChange);
```

only begins to recompute the live-ins for LoopMBB after the computation
for ExitMBB
has converged. With this change, all basic blocks have a recomputation
of the live-ins
for each loop iteration. This can result in less or more calls,
depending on the
situation.
2024-04-15 17:12:25 -04:00
Oskar Wirga
ff4636a4ab
Refactor recomputeLiveIns to converge on added MachineBasicBlocks (#79940)
This is a fix for the regression seen in
https://github.com/llvm/llvm-project/pull/79498

> Currently, the way that recomputeLiveIns works is that it will
recompute the livein registers for that MachineBasicBlock but it matters
what order you call recomputeLiveIn which can result in incorrect
register allocations down the line.

Now we do not recompute the entire CFG but we do ensure that the newly
added MBB do reach convergence.
2024-01-30 19:33:04 -08:00
Nikita Popov
07a1925b8b Revert "Refactor recomputeLiveIns to operate on whole CFG (#79498)"
This reverts commit 59bf60519fc30d9d36c86abd83093b068f6b1e4b.

Introduces a major compile-time regression.
2024-01-26 22:33:17 +01:00
Oskar Wirga
59bf60519f
Refactor recomputeLiveIns to operate on whole CFG (#79498)
Currently, the way that recomputeLiveIns works is that it will recompute
the livein registers for that MachineBasicBlock but it matters what
order you call recomputeLiveIn which can result in incorrect register
allocations down the line.

This PR fixes that by simply recomputing the liveins for the entire CFG
until convergence is achieved. This makes it harder to introduce subtle
bugs which alter liveness.
2024-01-26 11:25:36 -08:00
Kazu Hirata
01702c3f7f [llvm] Stop including llvm/ADT/SmallSet.h (NFC)
Identified with clangd.
2023-11-11 12:32:15 -08:00
Kazu Hirata
c64d0a7f10 [ARM] Remove unused declaration isSafeToDefineLR
The corresponding function definition was removed by:

  commit e82a0084d322948b94a5ca3213237d5eeab4920f
  Author: Sam Parker <sam.parker@arm.com>
  Date:   Fri Sep 25 09:36:40 2020 +0100
2023-05-16 19:12:52 -07:00
Jakub Kuderski
a0a76804c4 [ADT] Allow llvm::enumerate to enumerate over multiple ranges
This does not work by a mere composition of `enumerate` and `zip_equal`,
because C++17 does not allow for recursive expansion of structured
bindings.

This implementation uses `zippy` to manage the iteratees and adds the
stream of indices as the first zipped range. Because we have an upfront
assertion that all input ranges are of the same length, we only need to
check if the second range has ended during iteration.

As a consequence of using `zippy`, `enumerate` will now follow the
reference and lifetime semantics of the `zip*` family of functions. The
main difference is that `enumerate` exposes each tuple of references
through a new tuple-like type `enumerate_result`, with the familiar
`.index()` and `.value()` member functions.

Because the `enumerate_result` returned on dereference is a
temporary, enumeration result can no longer be used through an
lvalue ref.

Reviewed By: dblaikie, zero9178

Differential Revision: https://reviews.llvm.org/D144503
2023-03-15 19:34:22 -04:00
Jay Foad
a07584d57d [CodeGen] Make more use of MachineOperand::getOperandNo. NFC.
Differential Revision: https://reviews.llvm.org/D143252
2023-02-07 11:50:57 +00:00
David Green
e828dd4302 [ARM] Don't treat arguments as producesFalseLanesZero
Invalid tail predicated loops could be formed by treating function
arguments as FalseLanesZero due to getGlobalReachingDefs not returning
any values. Make sure we check that the list of Defs is empty and if so
treat it like a unknown value.

Differential Revision: https://reviews.llvm.org/D141399
2023-01-11 15:58:38 +00:00
David Green
4c4e544cd8 [ARM] Add an option for disabling omitting DLS.
Useful for testing, this option disables when `DLS lr, lr` gets removed.
2022-09-29 17:42:45 +01:00
Kazu Hirata
2833760c57 [Target] Qualify auto in range-based for loops (NFC) 2022-08-28 17:35:09 -07:00
Zongwei Lan
ad73ce318e [Target] use getSubtarget<> instead of static_cast<>(getSubtarget())
Differential Revision: https://reviews.llvm.org/D125391
2022-05-26 11:22:41 -07:00
serge-sans-paille
989f1c72e0 Cleanup codegen includes
This is a (fixed) recommit of https://reviews.llvm.org/D121169

after:  1061034926
before: 1063332844

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D121681
2022-03-16 08:43:00 +01:00
serge-sans-paille
ed98c1b376 Cleanup includes: DebugInfo & CodeGen
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D121332
2022-03-12 17:26:40 +01:00
Nico Weber
a278250b0f Revert "Cleanup codegen includes"
This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20.
Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang,
and many LLVM tests, see comments on https://reviews.llvm.org/D121169
2022-03-10 07:59:22 -05:00
serge-sans-paille
7f230feeea Cleanup codegen includes
after:  1061034926
before: 1063332844

Differential Revision: https://reviews.llvm.org/D121169
2022-03-10 10:00:30 +01:00
Kazu Hirata
c5cf7d910e [ARM] Use range-based for loops (NFC) 2021-12-20 23:06:47 -08:00
Kazu Hirata
de90490060 Revert "[ARM] Use range-based for loops (NFC)"
This reverts commit 93d79cac2ede436e1e3e91b5aff702914cdfbca7.

This patch seems to break
llvm/test/CodeGen/ARM/constant-islands-cfg.mir under asan.
2021-12-20 10:51:36 -08:00
Kazu Hirata
93d79cac2e [ARM] Use range-based for loops (NFC) 2021-12-20 00:04:53 -08:00
Kazu Hirata
26bd534a79 [llvm] Use none_of instead of \!any_of (NFC) 2021-12-17 13:48:57 -08:00
Kazu Hirata
c2bb9637d9 Use llvm::any_of and llvm::all_of (NFC) 2021-12-11 11:54:37 -08:00
Michael Forster
53801a59eb Fix two unused-variable warnings. 2021-10-07 15:17:58 +02:00
David Green
73346f5848 [ARM] Introduce a MQPRCopy
Currently when creating tail predicated loops, we need to validate that
all the live-outs of a loop will be equivalent with and without tail
predication, and if they are not we cannot legally create a
tail-predicated loop, leaving expensive vctp and vpst instructions in
the loop. These notably can include register-allocation instructions
like stack loads and stores, and copys lowered from COPYs to MVE_VORRs.

Instead of trying to prove this is valid late in the pipeline, this
patch introduces a MQPRCopy pseudo instruction that COPY is lowered to.
This can then either be converted to a MVE_VORR where possible, or to a
couple of VMOVD instructions if not. This way they do not behave
differently within and outside of tail-predications regions, and we can
know by construction that they are always valid. The idea is that we can
do the same with stack load and stores, converting them to VLDR/VSTR or
VLDM/VSTM where required to prove tail predication is always valid.

This does unfortunately mean inserting multiple VMOVD instructions,
instead of a single MVE_VORR, but my experiments show it to be an
improvement in general.

Differential Revision: https://reviews.llvm.org/D111048
2021-10-07 12:52:12 +01:00
David Green
02cd8a6b91 [ARM] Allow smaller VMOVL in tail predicated loops
This allows VMOVL in tail predicated loops so long as the the vector
size the VMOVL is extending into is less than or equal to the size of
the VCTP in the tail predicated loop. These cases represent a
sign-extend-inreg (or zero-extend-inreg), which needn't block tail
predication as in https://godbolt.org/z/hdTsEbx8Y.

For this a vecsize has been added to the TSFlag bits of MVE
instructions, which stores the size of the elements that the MVE
instruction operates on. In the case of multiple size (such as a
MVE_VMOVLs8bh that extends from i8 to i16, the largest size was be
chosen). The sizes are encoded as 00 = i8, 01 = i16, 10 = i32 and 11 =
i64, which often (but not always) comes from the instruction encoding
directly. A unit test was added, and although only a subset of the
vecsizes are currently used, the rest should be useful for other cases.

Differential Revision: https://reviews.llvm.org/D109706
2021-09-22 12:07:52 +01:00
David Green
9cb8f4d1ad [ARM] Add a tail-predication loop predicate register
The semantics of tail predication loops means that the value of LR as an
instruction is executed determines the predicate. In other words:

mov r3, #3
DLSTP lr, r3        // Start tail predication, lr==3
VADD.s32 q0, q1, q2 // Lanes 0,1 and 2 are updated in q0.
mov lr, #1
VADD.s32 q0, q1, q2 // Only first lane is updated.

This means that the value of lr cannot be spilled and re-used in tail
predication regions without potentially altering the behaviour of the
program. More lanes than required could be stored, for example, and in
the case of a gather those lanes might not have been setup, leading to
alignment exceptions.

This patch adds a new lr predicate operand to MVE instructions in order
to keep a reference to the lr that they use as a tail predicate. It will
usually hold the zeroreg meaning not predicated, being set to the LR phi
value in the MVETPAndVPTOptimisationsPass. This will prevent it from
being spilled anywhere that it needs to be used.

A lot of tests needed updating.

Differential Revision: https://reviews.llvm.org/D107638
2021-09-02 13:42:58 +01:00
David Green
8c50b5fbfe [ARM] Add extra debug messages for validating live outs. NFC
We are running into more and more cases where the liveouts of low
overhead loops do not validate. Add some extra debug messages to make it
clearer why.
2021-08-11 10:35:53 +01:00
Amara Emerson
4fee756c75 Delete copy-ctor of MachineFrameInfo.
I just hit a nasty bug when writing a unit test after calling MF->getFrameInfo()
without declaring the variable as a reference.

Deleting the copy-constructor also showed a place in the ARM backend which was
doing the same thing, albeit it didn't impact correctness there from the looks of it.
2021-08-05 23:24:37 -07:00
Sam Tebbs
ff0ef6a518 [ARM][LowOverheadLoops] Make some stack spills valid for tail predication
This patch makes vector spills valid for tail predication when all loads
from the same stack slot are within the loop

Differential Revision: https://reviews.llvm.org/D105443
2021-07-15 19:23:52 +01:00
David Green
bee2f618d5 [ARM] Introduce t2WhileLoopStartTP
This adds t2WhileLoopStartTP, similar to the t2DoLoopStartTP added in
D90591. It keeps a reference to both the tripcount register and the
element count register, so that the ARMLowOverheadLoops pass in the
backend can pick the correct one without having to search for it from
the operand of a VCTP.

Differential Revision: https://reviews.llvm.org/D103236
2021-06-13 13:55:34 +01:00
Michael Benfield
00d19c6704 [various] Remove or use variables which are unused but set.
This is in preparation for the -Wunused-but-set-variable warning.

Differential Revision: https://reviews.llvm.org/D102942
2021-06-01 15:38:48 -07:00
David Green
543406a69b [ARM] Allow findLoopPreheader to return headers with multiple loop successors
The findLoopPreheader function will currently not find a preheader if it
branches to multiple different loop headers. This patch adds an option
to relax that, allowing ARMLowOverheadLoops to process more loops
successfully. This helps with WhileLoopStart setup instructions that can
branch/fallthrough to the low overhead loop and to branch to a separate
loop from the same preheader (but I don't believe it is possible for
both loops to be low overhead loops).

Differential Revision: https://reviews.llvm.org/D102747
2021-05-24 12:22:15 +01:00
David Green
e7a6df68a6 [ARM] Fix the operand used for WLS in ARMLowOverheadLoops
The Loop start instruction handled by the ARMLowOverheadLoops are:
$lr = t2DoLoopStart $r0
$lr = t2DoLoopStartTP $r1, $r0
$lr = t2WhileLoopStartLR $r0, %bb, implicit-def dead $cpsr
All three of these will have LR as the 0 argument, the trip count as the
1 argument.

This patch updated a few places in ARMLowOverheadLoops where the 0th arg
was being used for t2WhileLoopStartLR instructions as the trip count.
One place was entirely removed as it does not seem valid any more, the
case the code is trying to protect against should not be able to occur
with our correct-by-construction low overhead loops.

Differential Revision: https://reviews.llvm.org/D102620
2021-05-21 09:29:30 +01:00
Victor Campos
f22b4c7122 [ARM] Handle debug instrs in ARM Low Overhead Loop pass
In function ConvertVPTBlocks(), it is assumed that every instruction
within a vector-predicated block is predicated. This is false for debug
instructions, used by LLVM.

Because of this, an assertion failure is reached when an input contains
debug instructions inside VPT blocks. In non-assert builds, an out of
bounds memory access took place.

The present patch properly covers the case of debug instructions.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D99075
2021-03-23 11:49:06 +00:00