1143 Commits

Author SHA1 Message Date
paperchalice
ef57977f2a
[NewPM][Hexagon] Add HexagonPassRegistry.def (#86244)
Prepare for dag-isel, also migrate some test case
2024-03-23 15:02:27 +08:00
Jonas Paulsson
7564566779 Reapply "Move assertion for AdjustsStack from PEI to MachineVerifier (#85698)"
- The check is now actually done in both PEI and the MachineVerifier.
- More .mir tests trivially updated with "adjustsStack: true" as needed.
2024-03-21 20:24:57 -04:00
Jonas Paulsson
9ebd329ad8 Revert "Move assertion for AdjustsStack from PEI to MachineVerifier. (#85698)"
This reverts commit 05bde30585710a51592eee0a6cf6df8184d09c92.

Reverting due to verifier complaints with expensive checks on build-bot.
2024-03-20 11:48:30 -04:00
Jonas Paulsson
05bde30585
Move assertion for AdjustsStack from PEI to MachineVerifier. (#85698)
Have the verifier report a missing AdjustsStack flag rather than waiting until
PEI asserts.
2024-03-20 10:29:12 -04:00
quic-areg
31f4b329c8
[Hexagon] ELF attributes for Hexagon (#85359)
Defines a subset of attributes and emits them to a section called
.hexagon.attributes.

The current attributes recorded are the attributes needed by
llvm-objdump to automatically determine target features and eliminate
the need to manually pass features.
2024-03-19 16:22:30 -05:00
Jonas Paulsson
09bc6abba6
[MachineFrameInfo] Refactoring around computeMaxcallFrameSize() (NFC) (#78001)
- Use computeMaxCallFrameSize() in PEI::calculateCallFrameInfo() instead of duplicating the code.

- Set AdjustsStack in FinalizeISel instead of in computeMaxCallFrameSize().
2024-03-18 10:37:59 -04:00
Nikita Popov
20b15e645c [Tests] Drop inrange attribute from some tests (NFC)
These don't actually test anything related to inrange, so drop the
attribute.
2024-03-13 11:49:16 +01:00
yandalur
f7d354af57
[Hexagon] Fix shift value when folding shl DAG node (#83853)
When folding (or (shl xx, s), (zext y)) to (COMBINE (shl xx, s-32), y),
fix resulting shift value in HexagonISD::COMBINE node to not generate
negative values.

---------

Co-authored-by: Yashas Andaluri <yandalur@qti.qualcomm.com>
2024-03-06 08:17:02 -06:00
Douglas Yung
edd0ef4f3c Add "REQUIRES: asserts" to 2 tests added in #83379 using "-debug-only" run arguments. 2024-03-01 01:06:42 -05:00
Sumanth Gundapaneni
ca9d2e923b
[Hexagon] Add Loop Alignment pass. (#83379)
Inspect a basic block and if its single basic block loop with a small
number of instructions, set the Loop Alignment to 32 bytes. This will
avoid the cache line break in the first packet of loop which will cause
a stall per each execution of loop.
2024-02-29 16:57:33 -06:00
Sumanth Gundapaneni
f44c3facca
Revert "[Hexagon] Optimize post-increment load and stores in loops. (… (#83151)
…#82418)"

This reverts commit d62ca8def395ac165f253fdde1d93725394a4d53.
2024-02-27 12:50:22 -06:00
Sumanth Gundapaneni
aaf2d078b6
[Hexagon] Clean up redundant transfer instructions. (#82663)
This patch adds a Hexagon specific backend pass that cleans up redundant
transfers after register allocation.
2024-02-22 17:31:37 -06:00
yandalur
6599c022be
[HEXAGON] Fix bit boundary for isub_hi in HexagonBitSimplify (#82336)
Use bit boundary of 32 for high subregisters in HexagonBitSimplify. This
fixes the subregister used in an upper half register store.
2024-02-22 11:48:06 -06:00
Sumanth Gundapaneni
d62ca8def3
[Hexagon] Optimize post-increment load and stores in loops. (#82418)
This patch optimizes the post-increment instructions so that we can
packetize them together.
v1 = phi(v0, v3')
v2,v3  = post_load v1, 4
v2',v3'= post_load v3, 4

This can be optimized in two ways

v1 = phi(v0, v3')
v2,v3' = post_load v1, 8
v2' = load v1, 4
2024-02-21 19:50:47 -06:00
Sumanth Gundapaneni
4c0fdcdb33
[Hexagon] Generate absolute-set load/store instructions. (#82034)
The optimization finds the loads/stores of a specific form and translate
the first load/store to an absolute-set form there by optimizing out the
transfer and eliminate the constant extenders.
2024-02-21 19:50:29 -06:00
Sumanth Gundapaneni
1219214a3b
[Hexagon] Update InstrInfo to include LD/ST offsets of vector instructions (#82386)
The hook HexagonInstrInfo::isValidOffset() is updated to evaluate
offsets of missed LD/ST vector instructions.
2024-02-20 15:29:05 -06:00
Krasimir Georgiev
49a8fc0da4 Revert "[Hexagon] Optimize post-increment load and stores in loops. (#82011)"
This reverts commit 0e6a48c3e8cc53f9eb5945ec04f8e03f6d2bae37.

Temporary revert as it causes bad codegen: https://github.com/llvm/llvm-project/pull/82011#issuecomment-1951426107
2024-02-20 12:15:23 +00:00
Sumanth Gundapaneni
0e6a48c3e8
[Hexagon] Optimize post-increment load and stores in loops. (#82011)
This patch optimizes the post-increment instructions so that we can
packetize them together.
v1 = phi(v0, v3')
v2,v3  = post_load v1, 4
v2',v3'= post_load v3, 4

This can be optimized in two ways

v1 = phi(v0, v3')
v2,v3' = post_load v1, 8
v2' = load v1, 4
2024-02-16 16:47:54 -06:00
sgundapa
de16a05af0
[Hexagon] Fix zero extension of bit predicates with vtrunehb (#81772)
vector extension from v4i1 to v4i8 generates an incorrect word. This
patch uses a vtrunehb for truncation to fix the bug.
2024-02-14 13:10:18 -06:00
Ikhlas Ajbar
76e3759d8d
[Hexagon] Order objects on the stack by their alignments (#81280)
This patch sorts stack objects by their alignment value from the largest
to the smallest. If two objects have the same alignment, then they are
sorted by their size from the largest to the smallest. This minimizes
padding and reduces run time stack size.
2024-02-10 14:42:50 -06:00
Nikita Popov
ff9af4c43a [CodeGen] Convert tests to opaque pointers (NFC) 2024-02-05 14:07:09 +01:00
quic-asaravan
dc5b4daae7
[HEXAGON] Inlining Division (#79021)
This patch inlines float division function calls for hexagon.

Co-authored-by: Awanish Pandey <awanpand@codeaurora.org>
2024-01-24 09:30:33 -06:00
Nikita Popov
90ba33099c
[InstCombine] Canonicalize constant GEPs to i8 source element type (#68882)
This patch canonicalizes getelementptr instructions with constant
indices to use the `i8` source element type. This makes it easier for
optimizations to recognize that two GEPs are identical, because they
don't need to see past many different ways to express the same offset.

This is a first step towards
https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699.
This is limited to constant GEPs only for now, as they have a clear
canonical form, while we're not yet sure how exactly to deal with
variable indices.

The test llvm/test/Transforms/PhaseOrdering/switch_with_geps.ll gives
two representative examples of the kind of optimization improvement we
expect from this change. In the first test SimplifyCFG can now realize
that all switch branches are actually the same. In the second test it
can convert it into simple arithmetic. These are representative of
common optimization failures we see in Rust.

Fixes https://github.com/llvm/llvm-project/issues/69841.
2024-01-24 15:25:29 +01:00
Nikita Popov
eecb99c5f6 [Tests] Add disjoint flag to some tests (NFC)
These tests rely on SCEV looking recognizing an "or" with no common
bits as an "add". Add the disjoint flag to relevant or instructions
in preparation for switching SCEV to use the flag instead of the
ValueTracking query. The IR with disjoint flag matches what
InstCombine would produce.
2023-12-05 14:09:36 +01:00
Igor Kirillov
63917e1975
[MachineLICM] Allow hoisting loads from invariant address (#70796)
Sometimes, loads can appear in a loop after the LICM pass is executed
the final time. For example, ExpandMemCmp pass creates loads in a loop,
and one of the operands may be an invariant address.
This patch extends the pre-regalloc stage MachineLICM by allowing to
hoist invariant loads from loops that don't have any stores or calls
and allows load reorderings.
2023-11-16 11:12:10 +00:00
Nikita Popov
e4a4122eb6
[IR] Remove zext and sext constant expressions (#71040)
Remove support for zext and sext constant expressions. All places
creating them have been removed beforehand, so this just removes the
APIs and uses of these constant expressions in tests.

There is some additional cleanup that can be done on top of this, e.g.
we can remove the ZExtInst vs ZExtOperator footgun.

This is part of
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.
2023-11-03 10:46:07 +01:00
Sundeep
4554eac5d4
Update call-long1.ll
[llvm][test][Hexagon] NFC: test commit
2023-10-23 11:55:42 -05:00
Jay Foad
7b3bbd83c0 Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038)"
This reverts commit 2501ae58e3bb9a70d279a56d7b3a0ed70a8a852c.

Reverted due to various buildbot failures.
2023-10-09 12:31:32 +01:00
Jay Foad
2501ae58e3
[CodeGen] Really renumber slot indexes before register allocation (#67038)
PR #66334 tried to renumber slot indexes before register allocation, but
the numbering was still affected by list entries for instructions which
had been erased. Fix this to make the register allocator's live range
length heuristics even less dependent on the history of how instructions
have been added to and removed from SlotIndexes's maps.
2023-10-09 11:44:41 +01:00
Dmitriy Smirnov
e13bed4c5f [PATCH] [llvm] [InstCombine] Canonicalise ADD+GEP
This patch tries to canonicalise add + gep to gep + gep.

Co-authored-by: Paul Walker <paul.walker@arm.com>

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D155688
2023-10-06 12:29:06 +01:00
JP Lehr
e816c89c84 Revert "InlineSpiller: Consider if all subranges are the same when avoiding redundant spills"
This reverts commit d8127b2ba8a87a610851b9a462f2fc2526c36e37.
2023-10-02 06:26:33 -05:00
Matt Arsenault
d8127b2ba8 InlineSpiller: Consider if all subranges are the same when avoiding redundant spills
This avoids some redundant spills of subranges, and avoids a compile failure.
This greatly reduces the numbers of spills in a loop.

The main range is not informative when multiple instructions are needed to fully define
a register. A common scenario is a lowered reg_sequence where every subregister
is sequentially defined, but each def changes the main range's value number. If
we look at specific lanes at the use index, we can see the value is actually the
same.

In this testcase, there are a large number of materialized 64-bit constant defs
which are hoisted outside of the loop by MachineLICM. These are feeding REG_SEQUENCES,
which is not considered rematerializable inside the loop. After coalescing, the split
constant defs produce main ranges with an apparent phi def. There's no phi def if you look
at each individual subrange, and only half of the register is really redefined to a constant.

Fixes: SWDEV-380865

https://reviews.llvm.org/D147079
2023-10-01 11:37:53 +03:00
Jay Foad
e0919b189b [CodeGen] Renumber slot indexes before register allocation (#66334)
RegAllocGreedy uses SlotIndexes::getApproxInstrDistance to approximate
the length of a live range for its heuristics. Renumbering all slot
indexes with the default instruction distance ensures that this estimate
will be as accurate as possible, and will not depend on the history of
how instructions have been added to and removed from SlotIndexes's maps.

This also means that enabling -early-live-intervals, which runs the
SlotIndexes analysis earlier, will not cause large amounts of churn due
to different register allocator decisions.
2023-09-19 11:18:12 +01:00
Guozhi Wei
cbdccb30c2 [RA] Split a virtual register in cold blocks if it is not assigned preferred physical register
If a virtual register is not assigned preferred physical register, it means some
COPY instructions will be changed to real register move instructions. In this
case we can try to split the virtual register in colder blocks, if success, the
original COPY instructions can be deleted, and the new COPY instructions in
colder blocks will be generated as register move instructions. It results in
fewer dynamic register move instructions executed.

The new test case split-reg-with-hint.ll gives an example, the hot path contains
24 instructions without this patch, now it is only 4 instructions with this
patch.

Differential Revision: https://reviews.llvm.org/D156491
2023-09-15 19:52:50 +00:00
Fangrui Song
cfc1a87878 [test] Change llc -march= to -mtriple= & llvm-mc -arch= to -triple=
Similar to 806761a7629df268c8aed49657aeccffa6bca449
2023-09-11 15:11:01 -07:00
Fangrui Song
806761a762 [test] Change llc -march= to -mtriple=
The issue is uncovered by #47698: for IR files without a target triple,
-mtriple= specifies the full target triple while -march= merely sets the
architecture part of the default target triple, leaving a target triple which
may not make sense, e.g. riscv64-apple-darwin.

Therefore, -march= is error-prone and not recommended for tests without a target
triple. The issue has been benign as we recognize $unknown-apple-darwin as ELF instead
of rejecting it outrightly.
2023-09-11 14:42:37 -07:00
Serguei Katkov
a701b7e368 [CGP] Remove dead PHI nodes before elimination of mostly empty blocks
Before elimination of mostly empty block it makes sense to remove dead PHI nodes.
It open more opportunity for elimination plus eliminates dead code itself.

It appeared that change results in failing many unit tests and some of
them I've updated and for another one I disable this optimization.
The pattern I observed in the tests is that there is a infinite loop
without side effects. As a result after elimination of dead phi node all other
related instruction are also removed and tests stops to check what it is expected.

Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D158503
2023-08-29 04:35:06 +00:00
Nikita Popov
69bd66b3ce [Tests] Remove some and/or constant expressions in tests (NFC)
In preparation for their removal in D158081.
2023-08-21 12:05:32 +02:00
Konstantina Mitropoulou
2c5d1b5ab7 [DAGCombiner] Reassociate the operands from (OR (OR(CMP1, CMP2)), CMP3) to (OR (OR(CMP1, CMP3)), CMP2)
This happens when CMP1 and CMP3 have the same predicate (or CMP2 and CMP3 have
the same predicate).

This helps optimizations such as the fololowing one:
CMP(A,C)||CMP(B,C) => CMP(MIN/MAX(A,B), C)
CMP(A,C)&&CMP(B,C) => CMP(MIN/MAX(A,B), C)

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D156215
2023-08-08 20:08:01 -07:00
Jay Foad
68a0a37371 [AggressiveAntiDepBreaker] Tweak the fix for renaming a subregister of a live register
This patch tweaks the fix in D20627 "Do not rename registers that do not
start an independent live range" to only consider Data dependencies, not
Output or Anti dependencies. An Output or Anti dependency to a superreg
does not imply that that superreg is live at the current instruction.

This enables breaking anti-dependencies in a few more cases as shown by
the lit test updates.

Differential Revision: https://reviews.llvm.org/D156879
2023-08-07 15:41:40 +01:00
Nikita Popov
063b37e7b4 Reapply [IR] Mark and/or constant expressions as undesirable
Reapply after D156401, which stops PatternMatch from recognizing
binop constant expressions, which should avoid the infinite loops
and assertion failures this patch previously exposed.

-----

In preparation for removing support for and/or expressions, mark
them as undesirable. As such, we will no longer implicitly create
such expressions, but they still exist.
2023-07-31 09:54:24 +02:00
Jay Foad
58642565ec [Hexagon] Add machine verification to some tests
This is to help catch problems in D156552 that only showed up in an
expensive checks build.
2023-07-30 12:07:34 +01:00
Jay Foad
e2e3f06813 Revert "[MachineScheduler] Track physical register dependencies per-regunit"
This reverts commit 1a54671d5405a39de362e9692ce963c0638023bc.

It was causing lit test failures in a LLVM_ENABLE_EXPENSIVE_CHECKS
build.
2023-07-29 18:05:25 +01:00
Jay Foad
1a54671d54 [MachineScheduler] Track physical register dependencies per-regunit
Change the scheduler's physical register dependency tracking from
registers-and-their-aliases to regunits. This has a couple of advantages
when subregisters are used:

- The dependency tracking is more accurate and creates fewer useless
  edges in the dependency graph. An AMDGPU example, edited for clarity:

    SU(0): $vgpr1 = V_MOV_B32 $sgpr0
    SU(1): $vgpr1 = V_ADDC_U32 0, $vgpr1
    SU(2): $vgpr0_vgpr1 = FLAT_LOAD_DWORDX2 $vgpr0_vgpr1, 0, 0

  There is a data dependency on $vgpr1 from SU(0) to SU(1) and from
  SU(1) to SU(2). But the old dependency tracking code also added a
  useless edge from SU(0) to SU(2) because it thought that SU(0)'s def
  of $vgpr1 aliased with SU(2)'s use of $vgpr0_vgpr1.

- On targets like AMDGPU that make heavy use of subregisters, each
  register can have a huge number of aliases - it can be quadratic in
  the size of the largest defined register tuple. There is a much lower
  bound on the number of regunits per register, so iterating over
  regunits is faster than iterating over aliases.

The LLVM compile-time tracker shows a tiny overall improvement of 0.03%
on X86. I expect a larger compile-time improvement on targets like
AMDGPU.

Differential Revision: https://reviews.llvm.org/D156552
2023-07-29 15:34:53 +01:00
Matthew Voss
380dbfd8ca Revert "Reapply [IR] Mark and/or constant expressions as undesirable"
This reverts commit 0cab8d20417c0e2ccc1ffc5505e080126f5de8e6.

Reverted due to an LTO crash. I've put a reduced test case here:
https://github.com/llvm/llvm-project/issues/64114
2023-07-26 12:54:07 -07:00
Nikita Popov
0cab8d2041 Reapply [IR] Mark and/or constant expressions as undesirable
This reapplies the change for and, but also marks or as undesirable
at the same time. Only handling one of them can cause infinite
combine loops due to the asymmetric handling.

-----

In preparation for removing support for and/or expressions, mark
them as undesirable. As such, we will no longer implicitly create
such expressions, but they still exist.
2023-07-25 15:31:45 +02:00
Nathan Chancellor
17f4f262fc
Revert "Reapply [IR] Mark and constant expressions as undesirable"
This reverts commit 086ee99564afbb11449c08ea2e094f7f49fadde5.

This patch causes an infinite loop when building arch/mips/mm/c-r4k.c in
the Linux kernel. See the comment in Phabricator for a reduced
reproducer: https://reviews.llvm.org/rG086ee99564afbb11449c08ea2e094f7f49fadde5
2023-07-21 15:57:03 -07:00
Nikita Popov
086ee99564 Reapply [IR] Mark and constant expressions as undesirable
Reapply after fixing an issue in canonicalizeLogicFirst() exposed
by this change (218f97578b26f7a89f7f8ed0748c31ef0181f80a).

-----

In preparation for removing support for and expressions, mark them
as undesirable. As such, we will no longer implicitly create such
expressions, but they still exist.
2023-07-21 10:10:50 +02:00
Nikita Popov
9dc391e89c Revert "[IR] Mark add constant expressions as undesirable"
This reverts commit f8a36d8c3e264c4fccf8058e699201a452ea7bb7.

I believe this is causing an assertion failure on the
sanitizer-x86_64-linux buildbot:

clang++: /b/sanitizer-x86_64-linux/build/llvm-project/llvm/include/llvm/Support/Casting.h:578: decltype(auto) llvm::cast(From *) [To = llvm::BinaryOperator, From = llvm::Value]: Assertion `isa<To>(Val) && "cast<Ty>() argument of incompatible type!"' failed.

  #10 0x000055bdd7e82408 canonicalizeLogicFirst(llvm::BinaryOperator&, llvm::IRBuilder<llvm::TargetFolder, llvm::IRBuilderCallbackInserter>&) /b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp:2131:5
  #11 0x000055bdd7e80183 llvm::InstCombinerImpl::visitAnd(llvm::BinaryOperator&) /b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp:2661:20

Likely the code is encountering a constant expression in a case it
didn't before.
2023-07-20 18:09:17 +02:00
Nikita Popov
f8a36d8c3e [IR] Mark add constant expressions as undesirable
In preparation for removing support for add expressions, mark them
as undesirable. As such, we will no longer implicitly create such
expressions, but they still exist.
2023-07-20 15:24:19 +02:00