37098 Commits

Author SHA1 Message Date
Kazu Hirata
9fecb4f907 [CodeGen] Fix a warning
This patch fixes:

  llvm/lib/CodeGen/MachineSink.cpp:1667:22: error: unused variable
  'Preheader' [-Werror,-Wunused-variable]
2025-01-23 19:37:28 -08:00
Matt Arsenault
0ef39a882b
MachineCSE: Remove check for subreg on a def operand (#124095)
There are no subregister defs in SSA.
2025-01-24 09:35:30 +07:00
Jeffrey Byrnes
acb7859f07
[MachineSink] Extend loop sinking capability (#117247)
The current MIR cycle sinking capabilities are rather limited. It only
support sinking copies into a single successor block while obeying
limits.

This opt-in feature adds a more aggressive option, that is not limited
to the above concerns. The feature will try to "sink" by duplicating any
top-level preheader instruction (that we are sure is safe to sink) into
any user block, then does some dead code cleanup. In particular, this is
useful for high RP situations when loop bodies have control flow.
2025-01-23 17:08:23 -08:00
Min-Yih Hsu
bc74a1edbe
[IA] Generalize the support for power-of-two (de)interleave intrinsics (#123863)
Previously, AArch64 used pattern matching to support
llvm.vector.(de)interleave of 2 and 4; RISC-V only supported
(de)interleave of 2.

This patch consolidates the logics in these two targets by factoring out
the common factor calculations into the InterleaveAccess Pass.
2025-01-23 15:27:51 -08:00
Jeffrey Byrnes
f2942b9077
[CodeGen] NFC: Move isDead to MachineInstr (#123531)
Provide isDead interface for access to ad-hoc isDead queries.
LivePhysRegs is optional: if not provided, pessimistically check
deadness of a single MI without doing the LivePhysReg walk; if provided
it is assumed to be at the position of MI.
2025-01-23 12:54:29 -08:00
Craig Topper
e30a4fc3e2
[TargetLowering] Improve one signature of forceExpandWideMUL. (#123991)
We have two forceExpandWideMUL functions. One takes the low and high
half of 2 inputs and calculates the low and high half of their product.
This does not calculate the full 2x width product.

The other signature takes 2 inputs and calculates the low and high half
of their full 2x width product. Previously it did this by sign/zero
extending the inputs to create the high bits and then calling the other
function.

We can instead copy the algorithm from the other function and use the
Signed flag to determine whether we should do SRA or SRL. This avoids
the need to multiply the high part of the inputs and add them to the
high half of the result. This improves the generated code for signed
multiplication.

This should improve the performance of #123262. I don't know yet how
close we will get to gcc.
2025-01-23 12:49:35 -08:00
Florian Hahn
0d0190815d
[TailDup] Allow large number of predecessors/successors without phis. (#116072)
This adjusts the threshold logic added in #78582 to only trigger for
cases where there are actually phis to duplicate in either TailBB or in
one of the successors.

In cases there are no phis, we only have to pay the cost of extra edges,
but have no explosion in PHI related instructions.

This improves performance of Python on some inputs by 2-3% on Apple
Silicon CPUs.

PR: https://github.com/llvm/llvm-project/pull/116072
2025-01-23 18:24:20 +00:00
Kazu Hirata
bb019dd165
[CodeGen] Avoid repeated hash lookups (NFC) (#124078) 2025-01-23 08:46:19 -08:00
Michael Maitland
7db4ba3916
[GlobalMerge][NFC] Fix inaccurate comments (#124136)
I was studying the code here and realized that the comments were talking
about grouping by basic blocks when the code was grouping by Function.
Fix the comments so they reflect what the code is actually doing.
2025-01-23 11:36:53 -05:00
Matt Arsenault
fb3fa41aee MachineRegisterInfo: Use variable for TRI 2025-01-23 20:29:25 +07:00
Jeremy Morse
cb714e74cc
[DebugInfo][InstrRef] Avoid producing broken DW_OP_deref_sizes (#123967)
We use variable locations such as DBG_VALUE $xmm0 as shorthand to refer
to "the low lane of $xmm0", and this is reflected in how DWARF is
interpreted too. However InstrRefBasedLDV tries to be smart and
interprets such a DBG_VALUE as a 128-bit reference. We then issue a
DW_OP_deref_size of 128 bits to the stack, which isn't permitted by
DWARF (it's larger than a pointer).

Solve this for now by not using DW_OP_deref_size if it would be illegal.
Instead we'll use DW_OP_deref, and the consumer will load the variable
type from the stack, which should be correct.

There's still a risk of imprecision when LLVM decides to use smaller or
larger value types than the source-variable type, which manifests as
too-little or too-much memory being read from the stack. However we
can't solve that without putting more type information in debug-info.

fixes #64093
2025-01-23 10:47:15 +00:00
Mats Jun Larsen
d7c14c8f97
[IR] Replace of PointerType::getUnqual(Type) with opaque version (NFC) (#123909)
Follow up to https://github.com/llvm/llvm-project/issues/123569
2025-01-23 18:23:05 +09:00
Benjamin Maxwell
778138114e
[SDAG] Use BatchAAResults for querying alias analysis (AA) results (#123934)
Once we get to SelectionDAG the IR should not be changing anymore, so we
can use BatchAAResults rather than AAResults to cache AA queries.

This should be a NFC change for targets that enable AA during codegen
(such as AArch64), but also give a nice compile-time improvement in some
cases. See:
https://github.com/llvm/llvm-project/pull/123787#issuecomment-2606797041

Note: This follows Nikita's suggestion on #123787.
2025-01-23 09:16:09 +00:00
Alan Li
220004d2f8
[GISel] Add more FP opcodes to CSE (#123949)
Resubmit, previously PR has compilation issues.
2025-01-22 23:00:08 -08:00
Mingming Liu
de209fa11b
[CodeGen] Introduce Static Data Splitter pass (#122183)
https://discourse.llvm.org/t/rfc-profile-guided-static-data-partitioning/83744
proposes to partition static data sections.

This patch introduces a codegen pass. This patch produces jump table
hotness in the in-memory states (machine jump table info and entries).
Target-lowering and asm-printer consume the states and produce `.hot`
section suffix. The follow up PR
https://github.com/llvm/llvm-project/pull/122215 implements such
changes.

---------

Co-authored-by: Ellis Hoag <ellis.sparky.hoag@gmail.com>
2025-01-22 21:06:46 -08:00
Matt Arsenault
15c2d4baf1
PeepholeOpt: Remove check for subreg index on a def operand (#123943)
This is looking at operand 0 of a REG_SEQUENCE, which can never
have a subregister index.
2025-01-23 09:06:26 +07:00
Matt Arsenault
2646e2d487
PeepholeOpt: Stop allocating tiny helper classes (NFC) (#123936)
This was allocating tiny helper classes for every instruction
visited. We can just dispatch over the cases in the visitor
function instead.
2025-01-23 09:00:08 +07:00
Matt Arsenault
6f69adeed6
PeepholeOpt: Remove null TargetRegisterInfo check (#123933)
This cannot happen. Also simplify the LaneBitmask check from !none
to any.
2025-01-23 08:57:04 +07:00
Matt Arsenault
23d2a1862a
PeepholeOpt: Remove unnecessary check for null TargetInstrInfo (#123929)
This can never happen.
2025-01-23 08:46:59 +07:00
Hua Tian
a9d2834508
[llvm][CodeGen] Fix the issue caused by live interval checking in window scheduler (#123184)
At some corner cases, the cloned MI still retains an old slot index,
which leads to the compiler crashing. This patch update the slot index
map before delete the recycled MI.

https://github.com/llvm/llvm-project/issues/123165
2025-01-23 09:39:03 +08:00
Ellis Hoag
b1943f40e7
[BranchFolding] Remove getBranchDebugLoc() (#114613) 2025-01-22 09:50:49 -08:00
Craig Topper
9e6494c0fb
[CodeGen] Rename RegisterMaskPair to VRegMaskOrUnit. NFC (#123799)
This holds a physical register unit or virtual register and mask.

While I was here I've used emplace_back and removed an unneeded use of a
template.
2025-01-22 09:11:22 -08:00
Danial Klimkin
c938436f71
Revert "[GISel] Add more FP opcodes to CSE (#123624)" (#123954)
This reverts commit 43177b524ee06dfc09cbc357ff277d4f53f5dc15.
2025-01-22 16:21:05 +01:00
lialan
43177b524e
[GISel] Add more FP opcodes to CSE (#123624)
This fixes #122724
2025-01-22 06:20:42 -08:00
Sander de Smalen
6b1db79887 Revert "Reland "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG" (#123632)"
There's a regression with one of the bootstrap builds for x86.
I'll revert this while I investigate.

This reverts commit 4df6d3df24ae9cff07c70c96a1663cbba6e1dca5.
2025-01-22 10:11:32 +00:00
Sander de Smalen
4df6d3df24
Reland "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG" (#123632)
This PR aims to reland work done by @arsenm which was previously
reverted due to some tangentially related scheduler issues as discussed
on #76416.

This PR cherry-picks the original commit (0e46b49de433), and adds
another patch on top with the following changes:

* The code in `updateRegDefsUses` now updates subranges when
  subreg-liveness-tracking is enabled.

* When adding an implicit-def operand for the super-register,
  the code in `reMaterializeTrivialDef` which tries to remove
  undefined subranges should now take into account that the lanes
  from the super-reg are no longer undefined.

Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
2025-01-22 09:07:46 +00:00
Elizaveta Noskova
3088c31699
[llvm] Add NCD search on Array of basic blocks (NFC) (#119355)
Shrink-Wrap points split Part 2.
RFC:
https://discourse.llvm.org/t/shrink-wrap-save-restore-points-splitting/83581

Part 1: https://github.com/llvm/llvm-project/pull/117862
Part 3: https://github.com/llvm/llvm-project/pull/119357
Part 4: https://github.com/llvm/llvm-project/pull/119358
Part 5: https://github.com/llvm/llvm-project/pull/119359
2025-01-22 11:55:02 +03:00
Kazu Hirata
19a7fe03b4
[CodeGen] Avoid repeated hash lookups (NFC) (#123894) 2025-01-22 00:17:55 -08:00
Eli Friedman
d540ebf6cb
[ARM64EC] Avoid emitting unnecessary symbol references with /guard:cf. (#123235)
.gfids$y contains a list of indirect calls for Control Flow Guard. This
wasn't working properly for ARM64EC: direct calls were being treated as
indirect calls. Make sure we correctly filter out direct calls.

This improves the protection from Control Flow Guard, and also fixes a
link error when using certain functions from oldnames.lib.
2025-01-21 16:29:23 -08:00
Jason Eckhardt
7cf8addc2d
[TLOF][NFC] Make emitLinkerDirectives virtual and public. (#123773)
Today, emitLinkerDirectives is private to TLOFCOFF-- it isolates parsing
and processing of the linker options. Similar processing is also done by
other TLOFs inline within emitModuleMetadata. This patch promotes
emitLinkerDirectives to a virtual (public) method so that this handling
is similarly isolated in the other TLOFs.

This also enables downstream targets to override just this handling
instead of the whole of emitModuleMetadata.
2025-01-21 18:24:33 -06:00
Vinicius Tadeu Zein
6ab9dafec8
[clang] Implement #pragma clang section on COFF targets (#112714)
This patch implements the directive #pragma clang section on COFF targets
with the exact same features available on ELF and Mach-O.
2025-01-21 16:12:58 -08:00
Craig Topper
cdd321462a
[TargetLowering] Use getShiftAmountConstant. NFC (#123802)
Previously we always used the pointer size which might need to be
legalized on some targets.
2025-01-21 12:05:52 -08:00
Matt Arsenault
5e79ae60a6
DAG: Fix vector_shuffle -> splat fold defining undef lanes (#123596)
For shuffle vector splats with undef lanes in the mask,
this was introducing real values. Filter out build_vector
results based on the undef elements in the mask.

This avoids AMDGPU test regressions in a future change.

test/CodeGen/X86/urem-seteq-illegal-types.ll looks worse
but I didn't investigate.
2025-01-21 23:55:50 +07:00
Craig Topper
f5f32cef61
[CodeGen] Use MCRegister instead of MCPhysReg in RegisterMaskPair. NFC (#123688)
Update some other places to avoid implicit conversions this introduces,
but I probably missed some.
2025-01-21 07:04:35 -08:00
Craig Topper
c3d820553f
[RegAllocFast] Don't convert MCRegUnit to MCRegister. NFC (#123705) 2025-01-21 07:03:23 -08:00
lialan
5d9c717597
[GISel] Fold shifts to constant result. (#123510)
This resolves #123212
2025-01-21 05:10:45 -08:00
David Sherwood
50bfa85d79
[DAGCombiner] Fix scalarizeExtractedBinOp for some SETCC cases (#123071)
PR https://github.com/llvm/llvm-project/pull/118823 added a
DAG combine for extracting elements of a vector returned from
SETCC, however it doesn't correctly deal with the case where
the vector element type is not i1. In this case we have to
take account of the boolean contents, which are represented
differently between vectors and scalars. The code now
explicitly performs an inreg sign extend in order to get the
same result.

Fixes https://github.com/llvm/llvm-project/issues/121372
2025-01-21 10:31:56 +00:00
Kazu Hirata
a588e20280
[SelectionDAG] Avoid repeated hash lookups (NFC) (#123697) 2025-01-21 16:24:49 +08:00
Mikhail Gudim
5cde6d2fdf
[ReachingDefAnalysis][NFC] Replace MCRegister with Register (#123626)
This is preparation for extending ReachingDefAnalysis to stack slots. We
should use `Register`, not `MCRegister` for something that can be a
physical register or a stack slot.
2025-01-21 01:04:18 -05:00
Craig Topper
1434313bd8 [LiveRegMatrix] Use MCRegUnit instead of MCRegister for register unit. NFC
MCRegister should be used for registers, not register units.
2025-01-20 10:57:34 -08:00
Kazu Hirata
efae9f3c21
[MIRParser] Avoid repeated map lookups (NFC) (#123561) 2025-01-20 10:15:27 -08:00
Kazu Hirata
bc1e699d9f
[CodeGen] Avoid repeated hash lookups (NFC) (#123557) 2025-01-20 10:13:08 -08:00
Alex MacLean
3606876b67
[SDAG] Fix CSE for ADDRSPACECAST nodes (#122912)
Correct CSE in SelectionDAG can make DAG combining more effective and
reduces the size of the DAG and thus should improve compile time.
2025-01-20 09:09:22 -08:00
Mats Jun Larsen
416f1c465d
[IR] Replace of PointerType::get(Type) with opaque version (NFC) (#123617)
In accordance with https://github.com/llvm/llvm-project/issues/123569

In order to keep the patch at reasonable size, this PR only covers for
the llvm subproject, unittests excluded.
2025-01-21 00:32:56 +09:00
Graham Hunter
d9f165ddea
[SDAG] Add an ISD node to help lower vector.extract.last.active (#118810)
Based on feedback from the clastb codegen PR, I'm refactoring basic codegen for the vector.extract.last.active intrinsic to lower to an ISD node in SelectionDAGBuilder then expand in LegalizeVectorOps, instead of doing everything in the builder.

The new ISD node (vector_find_last_active) only covers finding the index of the last active element of the mask, and extracting the element + handling passthru is left to existing ISD nodes.
2025-01-20 12:57:05 +00:00
Akshat Oke
3ace18d5c0
[CodeGen] MachineFunctionSplitter: Add missing initializer (#123564)
This registers the pass with PassRegistry so we can use -start-before
and other options for machine-function-splitter.
2025-01-20 16:56:46 +05:30
yingopq
754ed95b66
[Mips] Fix compiler crash when returning fp128 after calling a functi… (#117525)
…on returning { i8, i128 }

Fixes https://github.com/llvm/llvm-project/issues/96432.
2025-01-20 16:47:40 +08:00
Hervé Poussineau
be68f35bf5
[MC][CodeGen][Mips] Add CodeView mapping (#120877)
Also add support for new relocation types required by debug information.

Constants have been taken from CodeView Symbolic Debug Information
Specification.
2025-01-20 15:00:24 +08:00
Craig Topper
b7eee2c3fe [CodeGen] Remove some implict conversions of MCRegister to unsigned by using(). NFC
Many of these are indexing BitVectors or something where we can't
using MCRegister and need the register number.
2025-01-19 13:18:04 -08:00
Kazu Hirata
3d15bfb40c
[CodeGen] Avoid repeated hash lookups (NFC) (#123500) 2025-01-19 10:57:25 -08:00