This fixes#108936, but the calling convention doesn't match with GCC. I
doubt we have such a lib function for now, so leave the calling
convention as is.
Return SDValue() so we can notify the caller we did all replacements.
Restore the getNumValues() == 1 check in the assert in the caller now
that all handles only return nodes with a single result.
Reverted due to large .debug_line size regressions for some
configurations; work currently in place to improve the output of this
behaviour in PR #108251.
This patch also modifies two tests that were created or modified after
the original commit landed and are affected by the revert:
llvm/test/CodeGen/X86/pseudo_cmov_lower2.ll
llvm/test/DebugInfo/X86/empty-line-info.ll
This reverts commit 5fef40c2c477e92187bd4e5c18091eca6b8465cc.
Instead, return SDValue() to tell the caller to do the unrolling. This
is consistent with how some other handler work. Especially the handlers
that live in TLI.
ExpandBITREVERSE was rewritten to not take the Results vector an
argument.
This patch adds support for the missing STRICT_UINT_TO_FP and
STRICT_SINT_TO_FP for riscv and adds a test case for rv32 which was
previously crashing.
The code is in line with how other strict_* nodes are handled
(e.g., getting op(1) instead of op(0) when it's a strict node, as op(0)
in a strict node is the entry token).
The implementation was missing the fact that `G_EXTRACT_SUBVECTOR`
destination and source vector can be different types.
Also fix a bug in the MIR builder for `G_EXTRACT_SUBVECTOR` to generate
the correct opcode.
Clarify the G_EXTRACT_SUBVECTOR specification.
Unlike scalar, where AArch64 prefers expanding scmp/ucmp with select,
under Neon we can use the arithmetic expansion to generate fewer
instructions. Notably it also prevents the scalarization of vselect
during vector-legalization.
This is an implementation of the saturating fp to int conversions for
GlobalISel. On AArch64 the converstion instrctions work this way,
producing saturating results. LegalizerHelper::lowerFPTOINT_SAT is
ported from SDAG.
AArch64 has a lot of existing tests for fptosi_sat, covering a wide
range of types. I have tried to make most of them work all at once, but
a few fall back due to other missing features such as f128 handling for
min/max.
When we build a node with illegal type which has a user, it's possible
that it can end up being processed by the DAG combiner later before it's
removed, which can trigger an assert expecting the types to be legalized
already.
The InitUndef pass works around a register allocation issue, where undef
operands can be allocated to the same register as early-clobber result
operands. This may lead to ISA constraint violations, where certain
input and output registers are not allowed to overlap.
Originally this pass was implemented for RISCV, and then extended to ARM
in #77770. I've since removed the target-specific parts of the pass in
#106744 and #107885. This PR reduces the pass to use a single
requiresDisjointEarlyClobberAndUndef() target hook and enables it by
default. The hook is disabled for AMDGPU, because overlapping
early-clobber and undef operands are known to be safe for that target,
and we get significant codegen diffs otherwise.
The motivating case is the one in arm64-ldxr-stxr.ll, where we were
previously incorrectly allocating a stxp input and output to the same
register.
This PR is related to #99591. In this PR, instead of modifying how the
legalisation occurs depending on surrounding instructions, we refine
after legalisation.
This PR has two parts:
* `SDPatternMatch/MatchContext`: Modify a little bit the code to match
Operands (used by `m_Node(...)`) and Unary/Binary/Ternary Patterns to
make it compatible with `VPMatchContext`, instead of only `m_Opc`
supported. Some tests were added to ensure no regressions.
* `DAGCombiner`: Add a `foldSubCtlzNot` which detect and rewrite the
patterns using matching context.
Remaining Tasks:
- [ ] GlobalISel
- [ ] Currently the pattern matching will occur even before
legalisation. Should I restrict it to specific stages instead ?
- [ ] Style: Add a visitVP_SUB ?? Move `foldSubCtlzNot` in another
location for style consistency purpose ?
@topperc
---------
Co-authored-by: v01dxyz <v01dxyz@v01d.xyz>
This is a follow-up to #92289 that adds lowering of the new
`@llvm.experimental.vector.compress` intrinsic on x86 with AVX512
instructions. This intrinsic maps directly to `vpcompress`.
Once we modernize CopyInfo with default member initializations,
Copies.insert({Unit, ...})
becomes equivalent to:
Copies.try_emplace(Unit)
which we can simplify further down to Copies[Unit].
Create an FP_EXTEND instead of handling the soft promote directly. This
FP_EXTEND will be visited and soft promoted itself.
This removes a zero extend from the generated code when the f32 type is
itself softened. Previously we softened it as an fp16_to_fp which sees
the operand as an integer type so we extend it. When we soften the
result as an fp_extend we see the source as f16 and don't extend. It
only becomes an integer inside call lowering not by type legalization.
If this extend is really necessary, then we have an issue when an
f16->f32 fp_extend exists in the source and f32 needs to be softened.
This simplifies part of #102503.
Multiple invocations of the pass could interfere with eachother,
preventing some undefs being initialised.
I found it very difficult to create a unit test for this due to it being
dependent on particular allocations of a previous function. However, the
bug can be observed here: https://godbolt.org/z/7xnMo41Gv with the
creation of the illegal instruction `vnsrl.wi v9, v8, 0`
InitUndef currently always computes DeadLaneDetector, but only actually
uses it if subreg liveness is enabled for the target. Make the
calculation optional to avoid an unnecessary compile-time impact for
targets that don't enable subreg liveness.
trunc (binop X, C) --> binop (trunc X, trunc C) --> binop (trunc X, C`)
Try to narrow the width of math or bitwise logic instructions by pulling
a truncate ahead of binary operators.
Vx and Nx cores consider 32-bit and 64-bit basic arithmetic equal in
costs.
The InitUndef pass currently uses the getLargestSuperClass() hook (which
is only used by that pass) to chose the register to initialize. This was done
to reduce the number of undef init pseudos needed, e.g. so that the vrnov0
regclass would use the same pseudo as v0. After #106744 we use a single
generic pseudo, so this is no longer necessary.
The Hi result is sometimes calculated a different way and this
node goes unused. Defer creation until we know for sure it is neeeded.
The test changes is because the node creation order changed the names
in the debug output.
This commit introduces support for outlining functions across modules
using codegen data generated from previous codegen. The codegen data
currently manages the outlined hash tree, which records outlining
instances that occurred locally in the past.
The machine outliner now operates in one of three modes:
1. CGDataMode::None: This is the default outliner mode that uses the
suffix tree to identify (local) outlining candidates within a module.
This mode is also used by (full)LTO to maintain optimal behavior with
the combined module.
2. CGDataMode::Write (`-codegen-data-generate`): This mode is identical
to the default mode, but it also publishes the stable hash sequences of
instructions in the outlined functions into a local outlined hash tree.
It then encodes this into the `__llvm_outline` section, which will be
dead-stripped at link time.
3. CGDataMode::Read (`-codegen-data-use-path={.cgdata}`): This mode
reads a codegen data file (.cgdata) and initializes a global outlined
hash tree. This tree is used to generate global outlining candidates.
Note that the codegen data file has been post-processed with the raw
`__llvm_outline` sections from all native objects using the
`llvm-cgdata` tool (or a linker, `LLD`, or a new ThinLTO pipeline
later).
This depends on https://github.com/llvm/llvm-project/pull/105398. After
this PR, LLD (https://github.com/llvm/llvm-project/pull/90166) and Clang
(https://github.com/llvm/llvm-project/pull/90304) will follow for each
client side support.
This is a patch for
https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753.
Merge GlobalISel's isTriviallyDead and DeadMachineInstructionElim's
isDead code and remove all unnecessary checks from the hot path by
looping over the operands before doing any other checks.
See #105950 for why DeadMIElim needs to remove LIFETIME markers even
though they probably shouldn't generally be considered dead.
x86 CTMark O3: -0.1%
AArch64 GlobalISel CTMark O0: -0.6%, O2: -0.2%
In degenerate but legal inputs, we can have functions that have no source
locations at all -- all the DebugLocs attached to instructions are empty.
LLVM didn't produce any source location for the function; with this patch
it will at least emit the function-scope source location. Demonstrated by
empty-line-info.ll
The XCOFF test modified has similar symptoms -- with this patch, the size
of the ".dwline" section grows a bit, thus shifting some of the file
internal offsets, which I've updated.