34586 Commits

Author SHA1 Message Date
Daniel Hoekwater
866ae69cfa [AArch64] [BranchRelaxation] Optimize for hot code size in AArch64 branch relaxation
On AArch64, it is safe to let the linker handle relaxation of
unconditional branches; in most cases, the destination is within range,
and the linker doesn't need to do anything. If the linker does insert
fixup code, it clobbers the x16 inter-procedural register, so x16 must
be available across the branch before linking. If x16 isn't available,
but some other register is, we can relax the branch either by spilling
x16 OR using the free register for a manually-inserted indirect branch.

This patch builds on D145211. While that patch is for correctness, this
one is for performance of the common case. As noted in
https://reviews.llvm.org/D145211#4537173, we can trust the linker to
relax cross-section unconditional branches across which x16 is
available.

Programs that use machine function splitting care most about the
performance of hot code at the expense of the performance of cold code,
so we prioritize minimizing hot code size.

Here's a breakdown of the cases:

   Hot -> Cold [x16 is free across the branch]
     Do nothing; let the linker relax the branch.

   Cold -> Hot [x16 is free across the branch]
     Do nothing; let the linker relax the branch.

   Hot -> Cold [x16 used across the branch, but there is a free register]
     Spill x16; let the linker relax the branch.

     Spilling requires fewer instructions than manually inserting an
     indirect branch.

   Cold -> Hot [x16 used across the branch, but there is a free register]
     Manually insert an indirect branch.

     Spilling would require adding a restore block in the hot section.

   Hot -> Cold [No free regs]
     Spill x16; let the linker relax the branch.

   Cold -> Hot [No free regs]
     Spill x16 and put the restore block at the end of the hot function; let the linker relax the branch.
     Ex:
       [Hot section]
       func.hot:
         ... hot code...
       func.restore:
         ... restore x16 ...
         B func.hot

       [Cold section]
         func.cold:
         ... spill x16 ...
         B func.restore

     Putting the restore block at the end of the function instead of
     just before the destination increases the cost of executing the
     store, but it avoids putting cold code in the middle of hot code.
     Since the restore is very rarely taken, this is a worthwhile
     tradeoff.

Differential Revision: https://reviews.llvm.org/D156767
2023-09-06 20:44:40 +00:00
Simon Pilgrim
84447c044f [DAG] Add SelectionDAG::isADDLike helper. NFC.
Make the DAGCombine helper global so we can more easily reuse it.
2023-09-06 16:54:25 +01:00
Florian Hahn
4c9223c770
[ComplexDeinterleaving] Use MapVector to fix codegen non-determinism. 2023-09-06 15:57:35 +01:00
Simon Pilgrim
e4d0e12099 [DAG] Fold (shl (sext (add_nsw x, c1)), c2) -> (add (shl (sext x), c2), c1 << c2) (REAPPLIED)
Assuming the ADD is nsw then it may be sign-extended to merge with a SHL op in a similar fold to the existing (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2) fold.

This is most useful for helping to expose address math for X86, but has also touched several aarch64 test cases as well.

Alive2: https://alive2.llvm.org/ce/z/2UpSbJ

Differential Revision: https://reviews.llvm.org/D159198
2023-09-06 13:19:42 +01:00
Dmitri Gribenko
97bf104d97 Revert "[DAG] Fold (shl (sext (add_nsw x, c1)), c2) -> (add (shl (sext x), c2), c1 << c2)"
This reverts commit b027ce0ab93060bc6cb79d5402d21520e8b93fb7.

This commit breaks Transforms/InferAddressSpaces/AMDGPU/flat_atomic.ll.
2023-09-06 11:28:55 +02:00
Simon Pilgrim
b027ce0ab9 [DAG] Fold (shl (sext (add_nsw x, c1)), c2) -> (add (shl (sext x), c2), c1 << c2)
Assuming the ADD is nsw then it may be sign-extended to merge with a SHL op in a similar fold to the existing (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2) fold.

This is most useful for helping to expose address math for X86, but has also touched several aarch64 test cases as well.

Alive2: https://alive2.llvm.org/ce/z/2UpSbJ

Differential Revision: https://reviews.llvm.org/D159198
2023-09-06 10:06:21 +01:00
Ting Wang
71be020dda [SelectionDAG][PowerPC] Memset reuse vector element for tail store
On PPC there are instructions to store element from vector(e.g.
stxsdx/stxsiwx), and these instructions can be leveraged to avoid tail
constant in memset and constant splat array initialization.

This patch tries to explore these opportunities.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D138883
2023-09-06 01:52:38 -04:00
Amara Emerson
6c31f20fee
[GlobalISel] Fold fmul x, 1.0 -> x (#65379) 2023-09-06 03:14:16 +08:00
Amara Emerson
08e04209d8
[GlobalISel] Commute G_FMUL and G_FADD constant LHS to RHS. (#65298) 2023-09-05 23:48:34 +08:00
Simon Pilgrim
5463503ae1 [DAG] Move scalar BITCAST constant folds from getNode to FoldConstantArithmetic 2023-09-05 13:11:20 +01:00
David Sherwood
50598f0ff4 [DAGCombiner][SVE] Add support for illegal extending masked loads
In some cases where the same mask is used for multiple
extending masked loads it can be more efficient to combine
the zero- or sign-extend into the load even if it's not a
legal or custom operation. This leads to splitting up the
extending load into smaller parts, which also requires
splitting the mask. For SVE at least this improves the
performance of the SPEC benchmark x264 slightly on
neoverse-v1 (~0.3%), and at least one other benchmark
improves by around 30%. The uplift for SVE seems due to
removing the dependencies (vector unpacks) introduced
between the loads and the vector operations, since this
should increase the level of parallelism.

See tests:

  CodeGen/AArch64/sve-masked-ldst-sext.ll
  CodeGen/AArch64/sve-masked-ldst-zext.ll

https://reviews.llvm.org/D159191
2023-09-05 10:41:21 +00:00
pvanhout
aaf6755631 [GlobalISel] Refactor Combiner API
Remove CodeGen leftovers from the old combiner backend and adapt the API to fit the new backend better.
It's now quite a bit closer to how InstructionSelector works.

- `CombinerInfo` is now a simple "options" struct.
- `Combiner` is now the base class of all TableGen'd combiner implementation.
    - Many fields have been moved from derived classes into that class.
    - It has been refactored to create & own the Observer and Builder.
- `tryCombineAll` TableGen'd method can now be renamed, which allows targets to implement the actual `tryCombineAll` call manually and do whatever they want to do before/after it.

Note: `CombinerHelper` needs to be mutable because none of its methods are const. This can be revisited later.

Depends on D158710

Reviewed By: aemerson, dsanders

Differential Revision: https://reviews.llvm.org/D158713
2023-09-05 08:19:05 +02:00
Amara Emerson
91746d15d2
[GlobalISel] Fix G_PTR_ADD immediate chain combine using the wrong im… (#65271) 2023-09-05 08:06:40 +08:00
Jay Foad
71ca53b6cf
[GlobalISel] Lower G_SHUFFLE_VECTOR with scalar result (#65275) 2023-09-04 13:32:43 -04:00
Amara Emerson
0065640f40 [GlobalISel] Look through a G_PTR_ADD's def register instead of it's source operand's
uses when looking for load/store users. This was a simple logic bug during translation
of the equivalent function in SelectionDAG:
```
    for (SDNode *Node : N->uses()) {
      if (auto *LoadStore = dyn_cast<MemSDNode>(Node)) {
```
2023-09-04 00:28:57 -07:00
Amara Emerson
1ef1eec8db [GlobalISel][NFC] Use GenericMachineInstrs in CombinerHelper::reassociationCanBreakAddressingModePattern(). 2023-09-03 21:17:31 -07:00
Matt Arsenault
65b40f273f RegAlloc: Rename MLRegalloc* files to use consistent captalization
The other regalloc related files use RegAlloc, not Regalloc.
2023-09-03 09:00:27 -04:00
Kazu Hirata
5fb990ac51 [SelectionDAG] Use isNullConstant (NFC) 2023-09-02 09:32:43 -07:00
Fangrui Song
111fcb0df0 [llvm] Fix duplicate word typos. NFC
Those fixes were taken from https://reviews.llvm.org/D137338
2023-09-01 18:25:16 -07:00
Matt Arsenault
b14e83d1a4 IR: Add llvm.exp10 intrinsic
We currently have log, log2, log10, exp and exp2 intrinsics. Add exp10
to fix this asymmetry. AMDGPU already has most of the code for f32
exp10 expansion implemented alongside exp, so the current
implementation is duplicating nearly identical effort between the
compiler and library which is inconvenient.

https://reviews.llvm.org/D157871
2023-09-01 19:45:03 -04:00
Philip Reames
685e1909e9 [LegalizeDAG] Use scalable aware idiom for checking for single element vector
NFC for fixed vectors (all that reaches here currently), and future proofing for scalable vectors.
2023-09-01 11:56:03 -07:00
Simon Pilgrim
15b561ed38 [DAG] Move STEP_VECTOR constant fold from getNode to FoldConstantArithmetic 2023-09-01 15:47:37 +01:00
Simon Pilgrim
1d47d5d67c [DAG] Move F16<->FP constant folds from getNode to FoldConstantArithmetic 2023-09-01 15:47:36 +01:00
Simon Pilgrim
4b9c2cf0a7 [DAG] Move INT<->FP constant folds from getNode to FoldConstantArithmetic 2023-09-01 14:02:02 +01:00
Matt Arsenault
a68d72a995 MachineVerifier: Don't crash in LiveIntervals checks on generic vregs
If llvm-reduce is going to unconditionally verify functions with
LiveIntervals, it needs to be tolerant of generic vregs.

https://reviews.llvm.org/D133813
2023-09-01 08:32:02 -04:00
Simon Pilgrim
2a81396b1b [DAG] SimplifyDemandedBits - add SMIN/SMAX KnownBits comparison analysis
Followup to D158364

Also, final fix for Issue #59902 which noted that the snippet should just return 1
2023-09-01 12:42:30 +01:00
Simon Pilgrim
aca8b9d0d5 [DAG] SimplifyDemandedBits - if we're only demanding the signbits, a MIN/MAX node can be simplified to a OR or AND node
Extension to the signbit case, if the signbits extend down through all the demanded bits then SMIN/SMAX/UMIN/UMAX nodes can be simplified to a OR/AND/AND/OR.

Alive2: https://alive2.llvm.org/ce/z/mFVFAn (general case)

Differential Revision: https://reviews.llvm.org/D158364
2023-09-01 10:56:32 +01:00
Matt Arsenault
dfc0ede1f8 MachineFunction: Add verify method that accepts LiveIntervals
This version can be used without a PassManager

https://reviews.llvm.org/D133784
2023-08-31 18:14:39 -04:00
Matt Arsenault
ad9d13d535 SelectionDAG: Swap operands of atomic_store
Irritatingly, atomic_store had operands in the opposite order from
regular store. This made it difficult to share patterns between
regular and atomic stores.

There was a previous incomplete attempt to move atomic_store into the
regular StoreSDNode which would be better.

I think it was a mistake for all atomicrmw to swap the operand order,
so maybe it's better to take this one step further.

https://reviews.llvm.org/D123143
2023-08-31 17:30:10 -04:00
Arthur Eubanks
2a2f02e19f [X86] Use 64-bit jump table entries for large code model PIC
With the large code model, the label difference may not fit into 32 bits.
Even if we assume that any individual function is no larger than 2^32
and use a difference from the function entry to the target destination,
things like BOLT can rearrange blocks (even if BOLT doesn't necessarily
work with the large code model right now).

set directives avoid static relocations in some 32-bit entry cases, but
don't worry about set directives for 64-bit jump table entries (we can
do that later if somebody really cares about it).

check-llvm in a bootstrapped clang with the large code model passes.

Fixes #62894

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D159297
2023-08-31 14:13:38 -07:00
Daniel Paoliello
0c5c7b52f0 Emit the CodeView S_ARMSWITCHTABLE debug symbol for jump tables
The CodeView `S_ARMSWITCHTABLE` debug symbol is used to describe the layout of a jump table, it contains the following information:

* The address of the branch instruction that uses the jump table.
* The address of the jump table.
* The "base" address that the values in the jump table are relative to.
* The type of each entry (absolute pointer, a relative integer, a relative integer that is shifted).

Together this information can be used by debuggers and binary analysis tools to understand what an jump table indirect branch is doing and where it might jump to.

Documentation for the symbol can be found in the Microsoft PDB library dumper: 0fe89a942f/cvdump/dumpsym7.cpp (L5518)

This change adds support to LLVM to emit the `S_ARMSWITCHTABLE` debug symbol as well as to dump it out (for testing purposes).

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D149367
2023-08-31 12:06:50 -07:00
Konstantina Mitropoulou
17fc78e7a4 [DAGCombiner] Change foldAndOrOfSETCC() to optimize and/or patterns with floating points.
This reverts commit 48fa79a503a7cf380f98b6335fbd349afae1bd86.

Reviewed By: brooksmoses

Differential Revision: https://reviews.llvm.org/D159240
2023-08-31 11:36:50 -07:00
Philip Reames
6a55aa5ff3 [CodeGen] Print invalid instead of crashing when dumping invalid MVT 2023-08-31 11:01:34 -07:00
Nick Desaulniers
2fad6e6985 [InlineAsm] wrap Kind in enum class NFC
Should add some minor type safety to the use of this information, since
there's quite a bit of metadata being laundered through an `unsigned`.

I'm looking to potentially add more bitfields to that `unsigned`, but I
find InlineAsm's big ol' bag of enum values and usage of `unsigned`
confusing, type-unsafe, and un-ergonomic. These can probably be better
abstracted.

I think the lack of static_cast outside of InlineAsm indicates the prior
code smell fixed here.

Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D159242
2023-08-31 08:54:51 -07:00
Igor Kirillov
e2cb07c322 [CodeGen] Fix incorrect insertion point selection for reduction nodes in ComplexDeinterleavingPass
When replacing ComplexDeinterleavingPass::ReductionOperation, we can do it
either from the Real or Imaginary part. The correct way is to take whichever
is later in the BasicBlock, but before the patch, we just always took the
Real part.

Fixes https://github.com/llvm/llvm-project/issues/65044

Differential Revision: https://reviews.llvm.org/D159209
2023-08-31 10:38:01 +00:00
David Green
58a2f839fd [AArch64][GISel] Expand coverage of FDiv and move into place.
This adds some more extensive test coverage for fdiv through global isel,
switching the opcodes to use the more complete ActionDefinitions to handle more
cases and moving it into the position of the existing code which is no longer
needed.
2023-08-30 22:09:53 +01:00
Amara Emerson
c95ed6e492 [GlobalISel] Try to commute G_CONSTANT_FOLD_BARRIER LHS operands to RHS.
Differential Revision: https://reviews.llvm.org/D159097
2023-08-30 08:07:22 -07:00
Luke Lau
3a4ad45a2c [DAGCombiner] Combine trunc (splat_vector x) -> splat_vector (trunc x)
From the discussion in https://reviews.llvm.org/D158853, moving the truncate
into the splat helps more splatted scalar operands get selected on RISC-V, and
also avoids the need for splat_vector_parts on RV32.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D159147
2023-08-30 15:22:57 +01:00
Simon Pilgrim
376050db9f [DAG] Move some unary constant folds from getNode() to FoldConstantArithmetic()
We need to clean up some type handling before the remainder (int<->fp and bitcasts) can be moved over.
2023-08-30 13:59:28 +01:00
Simon Pilgrim
d037445f3a [DAG] visitSHL - use FoldConstantArithmetic to fold constants in (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2) fold
Matches what we do in the (shl (mul x, c1), c2) -> (mul x, c1 << c2) fold as well as inside visitShiftByConstant
2023-08-29 18:52:24 +01:00
Craig Topper
299b1b4071 [SelectionDAG][RISCV] Teach getConstant to use SPLAT_VECTOR_PARTS if vXi64 SPLAT_VECTOR is legal but i64 scalars are not.
That matches how such a SPLAT_VECTOR would have been type legalized
so assume it is ok to use for creating constants after type legalization.

Still need some improvements to SPLAT_VECTOR lowering.

This overlaps with some of what D158742 was trying to fix.

Reviewed By: luke

Differential Revision: https://reviews.llvm.org/D158870
2023-08-29 09:22:17 -07:00
Serguei Katkov
a701b7e368 [CGP] Remove dead PHI nodes before elimination of mostly empty blocks
Before elimination of mostly empty block it makes sense to remove dead PHI nodes.
It open more opportunity for elimination plus eliminates dead code itself.

It appeared that change results in failing many unit tests and some of
them I've updated and for another one I disable this optimization.
The pattern I observed in the tests is that there is a infinite loop
without side effects. As a result after elimination of dead phi node all other
related instruction are also removed and tests stops to check what it is expected.

Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D158503
2023-08-29 04:35:06 +00:00
Daniel Hoekwater
ef1c25eb50 [CodeGen][AArch64] Don't split jump table basic blocks
Jump tables on AArch64 are label-relative rather than table-relative, so
having jump table destinations that are in different sections causes
problems with relocation. Jump table lookups have a max range of 1MB, so
all destinations must be in the same section as the lookup code. Both of
these restrictions can be mitigated with some careful and complex logic,
but doing so doesn't gain a huge performance benefit.

Efficiently ensuring jump tables are correct and can be compressed on
AArch64 is a TODO item. In the meantime, don't split blocks that can
cause problems.

Differential Revision: https://reviews.llvm.org/D157124
2023-08-28 21:47:57 +00:00
Luke Lau
8f1d1e2b61 [SDAG] Add computeKnownBits support for ISD::SPLAT_VECTOR_PARTS
We can work out the known bits for a given lane by concatenating the known bits of each scalar operand.

In the description of ISD::SPLAT_VECTOR_PARTS in ISDOpcodes.h it says that the
total size of the scalar operands must cover the output element size, but I've
added a stricter assertion here that the total width of the scalar operands
must be exactly equal to the element size. It doesn't seem to trigger, and I'm
not sure if there any targets that use SPLAT_VECTOR_PARTS for anything other
than v4i32 -> v2i64 splats.

We also need to include it in isTargetCanonicalConstantNode, otherwise
returning the known bits introduces an infinite combine loop.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D158852
2023-08-28 10:35:58 +01:00
Luke Lau
6e4860f5d0 [SDAG] Add SimplifyDemandedBits support for ISD::SPLAT_VECTOR
This improves some cases where a splat_vector uses a build_pair that can be
simplified, e.g:

(rotl x:i64, splat_vector (build_pair x1:i32, x2:i32))

rotl only demands the bottom 6 bits, so this patch allows it to simplify it to:

(rotl x:i64, splat_vector (build_pair x1:i32, undef:i32))

Which in turn improves some cases where a splat_vector_parts is lowered on
RV32.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D158839
2023-08-28 10:35:56 +01:00
Kazu Hirata
25ac7a0d11 [llvm] Use SmallDenseMap::contains (NFC) 2023-08-27 00:18:16 -07:00
Aiden Grossman
856c463839 [CodeGenPrepare] Fix modification status bug
This was exposed in https://reviews.llvm.org/D158250 in
CodeGen/X86/statepoint-stack-usage.ll. There was no update to the
modification status in this section.

Co-Authored-By: nikic

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D158898
2023-08-26 00:31:40 -07:00
Arthur Eubanks
0a4fc4ac1c Revert "Emit the CodeView S_ARMSWITCHTABLE debug symbol for jump tables"
This reverts commit 8d0c3db388143f4e058b5f513a70fd5d089d51c3.

Causes crashes, see comments in https://reviews.llvm.org/D149367.

Some follow-up fixes are also reverted:

This reverts commit 636269f4fca44693bfd787b0a37bb0328ffcc085.
This reverts commit 5966079cf4d4de0285004eef051784d0d9f7a3a6.
This reverts commit e7294dbc85d24a08c716d9babbe7f68390cf219b.
2023-08-25 18:34:15 -07:00
Snehasish Kumar
3dbabeadd6 [CodeGen] Remove unused option in MachineFunctionSplitter.
The option was added in github.com/llvm/llvm-project/commit/90ab85a but it doesn't seem to be used. The triple check has been removed so this shouldn't be required going forward.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D158885
2023-08-25 21:24:28 +00:00
Kazu Hirata
5966079cf4 [CodeGen] Fix a warning
This patch fixes:

  llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp:3468:14: error:
  variable 'foundJTI' set but not used
  [-Werror,-Wunused-but-set-variable]
2023-08-25 13:18:58 -07:00