The current MIR cycle sinking capabilities are rather limited. It only
support sinking copies into a single successor block while obeying
limits.
This opt-in feature adds a more aggressive option, that is not limited
to the above concerns. The feature will try to "sink" by duplicating any
top-level preheader instruction (that we are sure is safe to sink) into
any user block, then does some dead code cleanup. In particular, this is
useful for high RP situations when loop bodies have control flow.
Previously, AArch64 used pattern matching to support
llvm.vector.(de)interleave of 2 and 4; RISC-V only supported
(de)interleave of 2.
This patch consolidates the logics in these two targets by factoring out
the common factor calculations into the InterleaveAccess Pass.
Provide isDead interface for access to ad-hoc isDead queries.
LivePhysRegs is optional: if not provided, pessimistically check
deadness of a single MI without doing the LivePhysReg walk; if provided
it is assumed to be at the position of MI.
We have two forceExpandWideMUL functions. One takes the low and high
half of 2 inputs and calculates the low and high half of their product.
This does not calculate the full 2x width product.
The other signature takes 2 inputs and calculates the low and high half
of their full 2x width product. Previously it did this by sign/zero
extending the inputs to create the high bits and then calling the other
function.
We can instead copy the algorithm from the other function and use the
Signed flag to determine whether we should do SRA or SRL. This avoids
the need to multiply the high part of the inputs and add them to the
high half of the result. This improves the generated code for signed
multiplication.
This should improve the performance of #123262. I don't know yet how
close we will get to gcc.
This adjusts the threshold logic added in #78582 to only trigger for
cases where there are actually phis to duplicate in either TailBB or in
one of the successors.
In cases there are no phis, we only have to pay the cost of extra edges,
but have no explosion in PHI related instructions.
This improves performance of Python on some inputs by 2-3% on Apple
Silicon CPUs.
PR: https://github.com/llvm/llvm-project/pull/116072
I was studying the code here and realized that the comments were talking
about grouping by basic blocks when the code was grouping by Function.
Fix the comments so they reflect what the code is actually doing.
We use variable locations such as DBG_VALUE $xmm0 as shorthand to refer
to "the low lane of $xmm0", and this is reflected in how DWARF is
interpreted too. However InstrRefBasedLDV tries to be smart and
interprets such a DBG_VALUE as a 128-bit reference. We then issue a
DW_OP_deref_size of 128 bits to the stack, which isn't permitted by
DWARF (it's larger than a pointer).
Solve this for now by not using DW_OP_deref_size if it would be illegal.
Instead we'll use DW_OP_deref, and the consumer will load the variable
type from the stack, which should be correct.
There's still a risk of imprecision when LLVM decides to use smaller or
larger value types than the source-variable type, which manifests as
too-little or too-much memory being read from the stack. However we
can't solve that without putting more type information in debug-info.
fixes#64093
Once we get to SelectionDAG the IR should not be changing anymore, so we
can use BatchAAResults rather than AAResults to cache AA queries.
This should be a NFC change for targets that enable AA during codegen
(such as AArch64), but also give a nice compile-time improvement in some
cases. See:
https://github.com/llvm/llvm-project/pull/123787#issuecomment-2606797041
Note: This follows Nikita's suggestion on #123787.
https://discourse.llvm.org/t/rfc-profile-guided-static-data-partitioning/83744
proposes to partition static data sections.
This patch introduces a codegen pass. This patch produces jump table
hotness in the in-memory states (machine jump table info and entries).
Target-lowering and asm-printer consume the states and produce `.hot`
section suffix. The follow up PR
https://github.com/llvm/llvm-project/pull/122215 implements such
changes.
---------
Co-authored-by: Ellis Hoag <ellis.sparky.hoag@gmail.com>
At some corner cases, the cloned MI still retains an old slot index,
which leads to the compiler crashing. This patch update the slot index
map before delete the recycled MI.
https://github.com/llvm/llvm-project/issues/123165
There's a regression with one of the bootstrap builds for x86.
I'll revert this while I investigate.
This reverts commit 4df6d3df24ae9cff07c70c96a1663cbba6e1dca5.
This PR aims to reland work done by @arsenm which was previously
reverted due to some tangentially related scheduler issues as discussed
on #76416.
This PR cherry-picks the original commit (0e46b49de433), and adds
another patch on top with the following changes:
* The code in `updateRegDefsUses` now updates subranges when
subreg-liveness-tracking is enabled.
* When adding an implicit-def operand for the super-register,
the code in `reMaterializeTrivialDef` which tries to remove
undefined subranges should now take into account that the lanes
from the super-reg are no longer undefined.
Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
.gfids$y contains a list of indirect calls for Control Flow Guard. This
wasn't working properly for ARM64EC: direct calls were being treated as
indirect calls. Make sure we correctly filter out direct calls.
This improves the protection from Control Flow Guard, and also fixes a
link error when using certain functions from oldnames.lib.
Today, emitLinkerDirectives is private to TLOFCOFF-- it isolates parsing
and processing of the linker options. Similar processing is also done by
other TLOFs inline within emitModuleMetadata. This patch promotes
emitLinkerDirectives to a virtual (public) method so that this handling
is similarly isolated in the other TLOFs.
This also enables downstream targets to override just this handling
instead of the whole of emitModuleMetadata.
For shuffle vector splats with undef lanes in the mask,
this was introducing real values. Filter out build_vector
results based on the undef elements in the mask.
This avoids AMDGPU test regressions in a future change.
test/CodeGen/X86/urem-seteq-illegal-types.ll looks worse
but I didn't investigate.
PR https://github.com/llvm/llvm-project/pull/118823 added a
DAG combine for extracting elements of a vector returned from
SETCC, however it doesn't correctly deal with the case where
the vector element type is not i1. In this case we have to
take account of the boolean contents, which are represented
differently between vectors and scalars. The code now
explicitly performs an inreg sign extend in order to get the
same result.
Fixes https://github.com/llvm/llvm-project/issues/121372
This is preparation for extending ReachingDefAnalysis to stack slots. We
should use `Register`, not `MCRegister` for something that can be a
physical register or a stack slot.
Based on feedback from the clastb codegen PR, I'm refactoring basic codegen for the vector.extract.last.active intrinsic to lower to an ISD node in SelectionDAGBuilder then expand in LegalizeVectorOps, instead of doing everything in the builder.
The new ISD node (vector_find_last_active) only covers finding the index of the last active element of the mask, and extracting the element + handling passthru is left to existing ISD nodes.
Also add support for new relocation types required by debug information.
Constants have been taken from CodeView Symbolic Debug Information
Specification.